Canceled [CANCELED] Optimization for ML and AI Seminar: Fantastic Pretraining Optimizers and Where to Find Them

HDSI 123 and Virtual 3234 Matthews Ln, La Jolla

Tengyu Ma, Stanford Abstract: AdamW has long been the dominant optimizer in language model pretraining, despite numerous claims that alternative optimizers offer 1.4 to 2x speedup. We posit that two methodological shortcomings have obscured fair comparisons and hindered practical adoption: (i) unequal hyperparameter tuning and (ii) limited or misleading evaluation setups. To address these two […]