Triple Phase Transitions: Understanding the Learning Dynamics of Large Language Models from a Neuroscience Perspective

Triple Phase Transitions: Understanding the Learning Dynamics of Large Language Models from a Neuroscience Perspective

ACL ARR 2026 January Submission1702 Authors

31 Dec 2025 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Systems Neuroscience, Encoding Models, fMRI, Large Language Models, Learning Dynamics, Mechanistic Interpretability

Abstract: While large language models (LLMs) acquire diverse capabilities during training, internal learning dynamics remain poorly understood. To address this, we incorporate a neuroscientific perspective and analyze three interrelated dimensions: the similarity between LLMs and the human brain, the internal states of LLMs, and downstream task performance. Across models varying in data and architecture, we identify three phase transitions during training: (1) alignment with the entire brain surges as LLMs begin adhering to task instructions (Brain Alignment and Instruction Following), (2) unexpectedly, LLMs diverge from the brain during a period in which downstream task accuracy temporarily stagnates (Brain Detachment and Stagnation), and (3) alignment with the brain reoccurs as LLMs become capable of solving the downstream tasks (Brain Realignment and Consolidation). These findings illuminate the underlying mechanisms of LLM training, while opening new avenues for interdisciplinary research bridging AI and neuroscience.

Paper Type: Long

Research Area: Interpretability and Analysis of Models for NLP

Research Area Keywords: hierarchical & concept explanations, probing

Contribution Types: Model analysis & interpretability, Data analysis

Languages Studied: English, Japanese

Submission Number: 1702

Loading