One Small Step, One Giant Leap: From Test-Time Tweaks to Global Guarantees

Name: TILOS HOT-AI Workshop: From Test-Time Tweaks to Global Guarantees with Mahdi Soltanolkotabi (USC)
Uploaded: 2025-04-22T14:34:07-07:00
Duration: 39 min 11 s
Description: One Small Step, One Giant Leap: From Test-Time Tweaks to Global Guarantees Mahdi Soltanolkotabi, USC Simple first-order methods like Gradient Descent (GD)

Mahdi Soltanolkotabi, USC

Simple first-order methods like Gradient Descent (GD) remain foundational to modern machine learning. Yet, despite their widespread use, our theoretical understanding of the GD trajectory—how and why it works—remains incomplete in both classical and contemporary settings. This talk explores new horizons in understanding the behavior and power of GD across two distinct but connected fronts.

In the first part, we examine the surprising power of a single gradient step in enhancing model reasoning. We focus on test-time training (TTT)—a gradient-based approach that adapts model parameters using individual test instances. We introduce a theoretical framework that reveals how TTT can effectively handle distribution shifts and significantly reduce the data required for in-context learning, shedding light on why such simple methods often outperform expectations.

The second part turns to a more classical optimization setting: learning shallow neural networks with GD. Despite extensive study, even fitting a one-hidden-layer model to basic target functions lacks rigorous performance guarantees. We present a comprehensive analysis of the GD trajectory in this regime, showing how it avoids suboptimal stationary points and converges efficiently to global optima. Our results offer new theoretical foundations for understanding how GD succeeds in the presence of sub-optimal stationary points.

3views

Machine Learning,

Workshops and Tutorials

One Small Step, One Giant Leap: From Test-Time Tweaks to Global Guarantees

You may also like