By Angela Berti31 March 2025

Single location regression and attention-based models

Claire Boyer, Université Paris-Saclay

Attention-based models, such as Transformer, excel across various tasks but lack a comprehensive theoretical understanding, especially regarding token-wise sparsity and internal linear representations. To address this gap, we introduce the single-location regression task, where only one token in a sequence determines the output, and its position is a latent random variable, retrievable via a linear projection of the input. To solve this task, we propose a dedicated predictor, which turns out to be a simplified version of a non-linear self-attention layer. We study its theoretical properties, by showing its asymptotic Bayes optimality and analyzing its training dynamics. In particular, despite the non-convex nature of the problem, the predictor effectively learns the underlying structure. This work highlights the capacity of attention mechanisms to handle sparse token information and internal linear structures.

5views

TILOS Seminar Series

attention mechanisms,

sparse token learning,

You may also like

High-dimensional Optimization with Applications to Compute-Optimal Neural Scaling Laws

High-dimensional Optimization with Applications to Compute-Optimal Neural Scaling Laws

2views

Machine Learning,

Optimization for ML & AI Seminar Series,

TILOS Seminar Series

TILOS Seminar: A New Paradigm for Learning with Distribution Shift

TILOS Seminar: A New Paradigm for Learning with Distribution Shift

2views

Machine Learning,

TILOS Seminar Series

TILOS HOT-AI Workshop: Flat Minima and Generalization with Maryam Fazel (University of Washington)

TILOS HOT-AI Workshop: Flat Minima and Generalization with Maryam Fazel (University of Washington)

8views

Machine Learning,

Workshops and Tutorials

TILOS HOT-AI Workshop: Hunting the Hessian with Madeleine Udell (Stanford University)

TILOS HOT-AI Workshop: Hunting the Hessian with Madeleine Udell (Stanford University)

6views

Machine Learning,

Workshops and Tutorials

TILOS HOT-AI Workshop: Accelerating Nonconvex Optimization via Online Learning with Aryan Mokhtari (UT Austin)

TILOS HOT-AI Workshop: Accelerating Nonconvex Optimization via Online Learning with Aryan Mokhtari (UT Austin)

7views

Workshops and Tutorials

TILOS HOT-AI Workshop: The Binary Iterative Hard Thresholding Algorithm with Arya Mazumdar (TILOS & UC San Diego)

TILOS HOT-AI Workshop: The Binary Iterative Hard Thresholding Algorithm with Arya Mazumdar (TILOS & UC San Diego)

3views

Workshops and Tutorials

TILOS HOT-AI Workshop: Reverse diffusion Monte Carlo with Yian Ma (TILOS & UC San Diego)

TILOS HOT-AI Workshop: Reverse diffusion Monte Carlo with Yian Ma (TILOS & UC San Diego)

8views

Workshops and Tutorials

TILOS HOT-AI Workshop: Linear Bregman Divergence Control with Babak Hassibi (Caltech)

TILOS HOT-AI Workshop: Linear Bregman Divergence Control with Babak Hassibi (Caltech)

7views

Workshops and Tutorials

TILOS HOT-AI Workshop: Optimization and Reasoning with Sean Gao (TILOS & UC San Diego)

TILOS HOT-AI Workshop: Optimization and Reasoning with Sean Gao (TILOS & UC San Diego)

6views

Workshops and Tutorials

TILOS Seminar: Amplifying human performance in combinatorial competitive programming

TILOS Seminar: Amplifying human performance in combinatorial competitive programming

7views

TILOS Seminar Series

combinatorial optimization,

competitive programming

12 3…5 »

Page 1 of 5

Leave A Reply Cancel reply