Triple Phase Transitions: Understanding the Learning Dynamics of Large Language Models from a Neuroscience Perspective
Keywords: Systems Neuroscience, Encoding Models, fMRI, Large Language Models, Learning Dynamics, Mechanistic Interpretability
Abstract: While large language models (LLMs) acquire diverse capabilities during training, internal learning dynamics remain poorly understood. To address this, we incorporate a neuroscientific perspective and analyze three interrelated dimensions: the similarity between LLMs and the human brain, the internal states of LLMs, and downstream task performance. Across models varying in data and architecture, we identify three phase transitions during training: (1) alignment with the entire brain surges as LLMs begin adhering to task instructions (Brain Alignment and Instruction Following), (2) unexpectedly, LLMs diverge from the brain during a period in which downstream task accuracy temporarily stagnates (Brain Detachment and Stagnation), and (3) alignment with the brain reoccurs as LLMs become capable of solving the downstream tasks (Brain Realignment and Consolidation). These findings illuminate the underlying mechanisms of LLM training, while opening new avenues for interdisciplinary research bridging AI and neuroscience.
Paper Type: Long
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: hierarchical & concept explanations, probing
Contribution Types: Model analysis & interpretability, Data analysis
Languages Studied: English, Japanese
Submission Number: 1702
Loading