Keywords: transformer, training dynamics, phase transition, lazy and rich, induction head
Abstract: Transformers have demonstrated exceptional in-context learning capabilities, yet the theoretical understanding of the underlying mechanisms remains limited.
A recent work [Elhage et al., 2021] identified a "rich" in-context mechanism known as induction head, contrasting with "lazy" $n$-gram models that overlook long-range dependencies.
In this work, we provide *dynamics analysis* of how transformers learn from lazy to rich mechanism.
Specifically, we study the training dynamics on a synthetic mixed target, composed of a 4-gram and an in-context 2-gram component. This controlled setting allows us to precisely characterize the entire training process and uncover an *abrupt transition* from lazy (4-gram) to rich (induction head) mechanisms as training progresses.
The theoretical insights are validated experimentally in both synthetic and real-world settings.
Student Paper: Yes
Submission Number: 27
Loading