Optimization for Overparametrized Models

Recently, there has been a surge in interest in developing optimization algorithms for overparameterized models as achieving generalization is believed to require algorithms with suitable biases. This interest centers on minimizing sharpness of the original loss function; the Sharpness-Aware Minimization (SAM) algorithm has proven effective. However, existing literature focuses on only a few sharpness measures (such as the maximum eigenvalue/trace of the training loss Hessian), which may not necessarily yield meaningful insights for non-convex optimization scenarios (e.g., neural networks). Moreover, many sharpness measures show sensitivity to parameter invariances in neural networks, e.g., they magnify significantly under rescaling parameters. Hence, this work introduces a new class of sharpness measures leading to sharpness-aware objective functions. The authors prove that these measures are universally expressive, allowing any function of the training loss Hessian matrix to be represented by choosing appropriate hyperparameters. Furthermore, they show that the proposed objective functions explicitly bias towards minimizing their corresponding sharpness measures. Finally, they demonstrate how the structure of this new class allows meaningful applications to overparameterized models with non-convex objective functions as well as models with parameter invariances, including scale-invariant neural networks.

Team Members

Stefanie Jegelka1
Melvin Leok2
Arya Mazumdar2
Suvrit Sra1
Nisheeth Vishnoi3
Yusu Wang2

Collaborators

Dara Bahri4
Patrick Jaillet2

1. UC San Diego
2. MIT
3. Yale
4. Google Research

Publications

ICML 2024 >