
TILOS Seminar: Single location regression and attention-based models
HDSI 123 and Virtual 3234 Matthews Ln, La Jolla, CA, United StatesClaire Boyer, Université Paris-Saclay Abstract: Attention-based models, such as Transformer, excel across various tasks but lack a comprehensive theoretical understanding, especially regarding token-wise sparsity and internal linear representations. To address this gap, we introduce the single-location regression task, where only one token in a sequence determines the output, and its position is a latent random […]