A simple mean field model of feature learning

Niclas Alexander Göring; Chris Mingard; Yoonsoo Nam; Ard A. Louis

A simple mean field model of feature learning

Niclas Alexander Göring, Chris Mingard, Yoonsoo Nam, Ard A. Louis

17 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Feature Learning, Deep Learning Theory, Mean-Field Theory, Generalization, Phase Transitions, Stochastic Gradient Langevin Dynamics, Statistical Physics, Representation Learning, Finite width neural networks

TL;DR: We give a simple mean-field theory for SGLD-trained two-layer networks that predicts a symmetry-breaking transition and identifies input-feature selection as the missing mechanism to match learning curves of SGLD trained networks.

Abstract: Feature learning (FL), where neural networks adapt their internal representations during training, remains poorly understood. Using methods from statistical physics, we derive a tractable, self-consistent mean-field (MF) theory for the Bayesian posterior of two-layer non-linear networks trained with stochastic gradient Langevin dynamics (SGLD). At infinite width, this theory reduces to kernel ridge regression, but at finite width it predicts a symmetry breaking phase transition where networks abruptly align with target functions. While the basic MF theory provides theoretical insight into the emergence of FL in the finite-width regime, semi-quantitatively predicting the onset of FL with noise or sample size, it substantially underestimates the improvements in generalisation after the transition. We trace this discrepancy to a key mechanism absent from the plain MF description: \textit{self-reinforcing input feature selection}. Incorporating this mechanism into the MF theory allows us to quantitatively match the learning curves of SGLD-trained networks and provides mechanistic insight into FL.

Supplementary Material: pdf

Primary Area: interpretability and explainable AI

Submission Number: 9630

Loading