Position: Solve Layerwise Linear Models First to Understand Neural Dynamical Phenomena (Neural Collapse, Emergence, Lazy/Rich Regime, and Grokking)

Yoonsoo Nam; Seok Hyeong Lee; Clémentine Carla Juliette Dominé; Yeachan Park; Charles London; Wonyl Choi; Niclas Alexander Göring; Seungjai Lee

Position: Solve Layerwise Linear Models First to Understand Neural Dynamical Phenomena (Neural Collapse, Emergence, Lazy/Rich Regime, and Grokking)

Yoonsoo Nam, Seok Hyeong Lee, Clémentine Carla Juliette Dominé, Yeachan Park, Charles London, Wonyl Choi, Niclas Alexander Göring, Seungjai Lee

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 Position Paper Track posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: Solvable linear neural networks explain various phenomena in deep neural networks. We advocate for further research using solvable models and the dynamical feedback principle.

Abstract: In physics, complex systems are often simplified into minimal, solvable models that retain only the core principles. In machine learning, layerwise linear models (e.g., linear neural networks) act as simplified representations of neural network dynamics. These models follow the dynamical feedback principle, which describes how layers mutually govern and amplify each other's evolution. This principle extends beyond the simplified models, successfully explaining a wide range of dynamical phenomena in deep neural networks, including neural collapse, emergence, lazy and rich regimes, and grokking. In this position paper, we call for the use of layerwise linear models retaining the core principles of neural dynamical phenomena to accelerate the science of deep learning.

Lay Summary: Neural collapse, emergence, scaling laws, the lazy and rich regimes, and grokking are widely studied phenomena in deep neural networks. While these behaviors are often attributed to complex interactions between architecture, data, and non-linear activations, we propose a unifying explanation based on gradient dynamics in layerwise models. Notably, these models lack non-linear activations, highlighting that the layerwise structure alone is a powerful yet underappreciated characteristic of deep neural networks. In this position paper, we argue that focusing on analytically tractable, layerwise models can not only explain existing phenomena but also uncover new insights, accelerating the scientific understanding of deep learning.

Verify Author Names: My co-authors have confirmed that their names are spelled correctly both on OpenReview and in the camera-ready PDF. (If needed, please update ‘Preferred Name’ in OpenReview to match the PDF.)

No Additional Revisions: I understand that after the May 29 deadline, the camera-ready submission cannot be revised before the conference. I have verified with all authors that they approve of this version.

Pdf Appendices: My camera-ready PDF file contains both the main text (not exceeding the page limits) and all appendices that I wish to include. I understand that any other supplementary material (e.g., separate files previously uploaded to OpenReview) will not be visible in the PMLR proceedings.

Latest Style File: I have compiled the camera ready paper with the latest ICML2025 style files <https://media.icml.cc/Conferences/ICML2025/Styles/icml2025.zip> and the compiled PDF includes an unnumbered Impact Statement section.

Paper Verification Code: MDUwN

Link To Code: https://github.com/yoonsoonam119/linear_first

Permissions Form: pdf

Primary Area: Research Priorities, Methodology, and Evaluation

Keywords: science of deep learning, linear neural networks, dynamics, neural collapse, emergence, grokking, lazy/rich regime

Submission Number: 48

Loading