TILOS Seminar: Transformers learn in-context by (functional) gradient descent

Xiang Cheng, TILOS Postdoctoral Scholar at MIT

Abstract: Motivated by the in-context learning phenomenon, we investigate how the Transformer neural network can implement learning algorithms in its forward pass. We show that a linear Transformer naturally learns to implement gradient descent, which enables it to learn linear functions in-context. More generally, we show that a non-linear Transformer can implement functional gradient descent with respect to some RKHS metric, which allows it to learn a broad class of functions in-context. Additionally, we show that the RKHS metric is determined by the choice of attention activation, and that the optimal choice of attention activation depends in a natural way on the class of functions that need to be learned. I will end by discussing some implications of our results for the choice and design of Transformer architectures.

The event is finished.

Date

Apr 17 2024
Expired!

Time

Pacific Daylight Time
10:00 - 11:00

Local Time

  • Timezone: America/New_York
  • Date: Apr 17 2024
  • Time: 13:00 - 14:00

Location

Halıcıoğlu Data Science Building Room 123
3234 Matthews Ln, La Jolla, CA 92093

Organizer

TILOS

Speaker