BEGIN:VCALENDAR
VERSION:2.0
PRODID:-// - ECPv6.15.18//NONSGML v1.0//EN
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-ORIGINAL-URL:https://tilos.ai
X-WR-CALDESC:Events for 
REFRESH-INTERVAL;VALUE=DURATION:PT1H
X-Robots-Tag:noindex
X-PUBLISHED-TTL:PT1H
BEGIN:VTIMEZONE
TZID:America/Los_Angeles
BEGIN:DAYLIGHT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
TZNAME:PDT
DTSTART:20230312T100000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
TZNAME:PST
DTSTART:20231105T090000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
TZNAME:PDT
DTSTART:20240310T100000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
TZNAME:PST
DTSTART:20241103T090000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
TZNAME:PDT
DTSTART:20250309T100000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
TZNAME:PST
DTSTART:20251102T090000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20240417T100000
DTEND;TZID=America/Los_Angeles:20240417T110000
DTSTAMP:20260404T145833
CREATED:20250828T201326Z
LAST-MODIFIED:20250828T201326Z
UID:7309-1713348000-1713351600@tilos.ai
SUMMARY:TILOS Seminar: Transformers learn in-context by (functional) gradient descent
DESCRIPTION:Xiang Cheng\, TILOS Postdoctoral Scholar\, MIT \nAbstract: Motivated by the in-context learning phenomenon\, we investigate how the Transformer neural network can implement learning algorithms in its forward pass. We show that a linear Transformer naturally learns to implement gradient descent\, which enables it to learn linear functions in-context. More generally\, we show that a non-linear Transformer can implement functional gradient descent with respect to some RKHS metric\, which allows it to learn a broad class of functions in-context. Additionally\, we show that the RKHS metric is determined by the choice of attention activation\, and that the optimal choice of attention activation depends in a natural way on the class of functions that need to be learned. I will end by discussing some implications of our results for the choice and design of Transformer architectures.
URL:https://tilos.ai/event/tilos-seminar-transformers-learn-in-context-by-functional-gradient-descent/
LOCATION:HDSI 123 and Virtual\, 3234 Matthews Ln\, La Jolla\, CA\, 92093\, United States
CATEGORIES:TILOS Seminar Series
ATTACH;FMTTYPE=image/jpeg:https://tilos.ai/wp-content/uploads/2023/10/cheng-xiang.jpg
END:VEVENT
END:VCALENDAR