A primer on analytical learning dynamics of nonlinear neural networks

Rodrigo Antonio Carrasco-Davis; Erin Grant

A primer on analytical learning dynamics of nonlinear neural networks

Rodrigo Antonio Carrasco-Davis, Erin Grant

Published: 23 Jan 2025, Last Modified: 26 Feb 2025ICLR 2025 Blogpost TrackEveryoneRevisionsBibTeXCC BY 4.0

Blogpost Url: https://d2jud02ci9yv69.cloudfront.net/2025-04-28-analytical-simulated-dynamics-89/blog/analytical-simulated-dynamics/

Abstract: The learning dynamics of neural networks—in particular, how parameters change over time during training—describe how data, architecture, and algorithm interact in time to produce a trained neural network model. Characterizing these dynamics, in general, remains an open problem in machine learning, but, handily, restricting the setting allows careful empirical studies and even analytical results. In this blog post, we review approaches to analyzing the learning dynamics of nonlinear neural networks, focusing on a particular setting known as *teacher-student* that permits an explicit analytical expression for the generalization error of a nonlinear neural network trained with online gradient descent. We provide an accessible mathematical formulation of this analysis and a `JAX` codebase to implement simulation of the analytical system of ordinary differential equations alongside neural network training in this setting. We conclude with a discussion of how this analytical paradigm has been used to investigate generalization in neural networks and beyond.

Conflict Of Interest: **tl;dr:** No conflict with featured papers. Cite some colleagues' work. We have no conflict of interest with the classical papers that introduced the analytical approach that we focus on in this blog post, that of [Saad & Solla (1995)](https://doi.org/10.1103/PhysRevE.52.4225) and [Riegler & Biehl (1995)](https://dx.doi.org/10.1088/0305-4470/28/20/002). We aim to cite all works making use of the SS+RB95 analytical approach and most that make use of the teacher-student setting, which includes the work of several of our colleagues, among many others. We do not highlight our colleagues' work and cite no work of our own. Our goal is to make the SS+RB95 approach more accessible so that others interested in analytical learning dynamics can build on it.

Submission Number: 9

Loading