TILOS Seminar: Single location regression and attention-based models

HDSI 123 and Virtual 3234 Matthews Ln, La Jolla

Claire Boyer, Université Paris-Saclay Abstract: Attention-based models, such as Transformer, excel across various tasks but lack a comprehensive theoretical understanding, especially regarding token-wise sparsity and internal linear representations. To address this gap, we introduce the single-location regression task, where only one token in a sequence determines the output, and its position is a latent random […]