Gradient Descent on Two ReLU Neurons: Global Landscape and Bifurcation Dynamics

Binghua Li; Mengzhe Li; Denny Wu; Tianhao Wang

Gradient Descent on Two ReLU Neurons: Global Landscape and Bifurcation Dynamics

Binghua Li, Mengzhe Li, Denny Wu, Tianhao Wang

Published: 29 May 2026, Last Modified: 29 May 2026HiLD at ICML 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: feature learning, gradient flow, high-dimensional learning dynamics, multi-index models, loss landscape

Abstract: Understanding how gradient descent learns features remains a central challenge in neural network theory. We study this question in a minimal multi-index setting that already exhibits rich multi-phase gradient dynamics: a well-specified two-neuron ReLU teacher-student model under isotropic Gaussian inputs, trained by population gradient flow from small random initialization. We show that the dynamics are organized by two directions: the “easy” bisector direction, which carries the leading signal, and the “hard” splitting direction, which governs specialization. To characterize the loss structure and learning behavior, we analyze the population squared-loss landscape and show that every nonzero critical point is either a global minimum or a saddle. We then track the gradient-flow trajectory, showing that student neurons first collapse toward a bisector saddle before escaping and specializing to teacher neurons. Our results provide both a landscape and dynamics account of the multi-phase symmetry-breaking behavior that arises in a simple multi-index model and under standard algorithmic and architectural choices.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 161

Loading