Published 30 January 2025 at 838 × 631 in How Transformers Learn Causal Structure with Gradient Descent