
TILOS Seminar: Transformers learn in-context by (functional) gradient descent
HDSI 123 and Virtual 3234 Matthews Ln, La Jolla, CA, United StatesXiang Cheng, TILOS Postdoctoral Scholar, MIT Abstract: Motivated by the in-context learning phenomenon, we investigate how the Transformer neural network can implement learning algorithms in its forward pass. We show that a linear Transformer naturally learns to implement gradient descent, which enables it to learn linear functions in-context. More generally, we show that a non-linear […]