The Hidden Convex Optimization Landscape of Deep Neural Networks
Mert Pilanci, Stanford University
Abstract: Since deep neural network training problems are inherently non-convex, their recent dramatic success largely relies on non-convex optimization heuristics and experimental findings. Despite significant advancements, the non-convex nature of neural network training poses two central challenges: first, understanding the underlying mechanisms that contribute to model performance, and second, achieving efficient training with low computational cost and energy consumption. The performance of non-convex models is notably influenced by the selection of optimization methods and hyperparameters, including initialization, mini-batching, and step sizes. Conversely, convex optimization problems are characterized by their robustness to these choices, allowing for the efficient and consistent achievement of globally optimal solutions, irrespective of optimization parameters.
In this talk, we explore a novel perspective by examining multilayer neural networks equipped with ReLU activation functions through the framework of convex optimization. We introduce exact convex optimization formulations of ReLU network training problems. We show that two-layer ReLU networks can be globally trained via convex programs with the number of variables polynomial in the number of training samples, feature dimension, and the number of hidden neurons. We show that our analysis extends to deeper networks and that these convex programs possess an intuitive geometric interpretation. Our results provide an equivalent characterization of neural networks as convex models where a mixture of locally linear models are fitted to the data with sparsity inducing convex regularization. Moreover, we show that standard convolutional neural networks can be globally optimized in fully polynomial time. We discuss extensions to batch normalization, generative adversarial networks and transformers. Finally, we present numerical simulations verifying our claims and illustrating that the proposed convex approach is faster and more reliable than standard local search heuristics such as SGD and variants.
Mert Pilanci is an assistant professor of Electrical Engineering at Stanford University. He received his Ph.D. in Electrical Engineering and Computer Science from UC Berkeley in 2016. Prior to joining Stanford, he was an assistant professor of Electrical Engineering and Computer Science at the University of Michigan. In 2017, he was a Math+X postdoctoral fellow working with Emmanuel Candès at Stanford University. Mert’s research interests are in neural networks, machine learning, optimization, and signal processing. His group develops theory and algorithms for solving large scale optimization problems in machine learning. His research also seeks to develop safe and interpretable artificial intelligence and information theoretic foundations of distributed computing.