Phase Transitions or Continuous Evolution? Methodological Sensitivity in Neural Network Training Dynamics
Abstract: Recent work on neural network training dynamics often identifies transitions or phase changes in weight matrices through rank-based metrics. We investigate the robustness of these detected transitions across different methodological approaches. Analyzing 55 experiments spanning Transformer, CNN, and MLP architectures (30,147 measurement points), we find that transition detection
exhibits substantial sensitivity to methodological choices. Varying the detection threshold from 2$\sigma$ to 100$\sigma$ changes total detected transitions by an order of magnitude (25,513 to 1,608). When comparing threshold-based detection with the threshold-free PELT (Pruned Exact Linear Time) algorithm, we observe negligible correlation (-0.029) between methods: PELT identifies 40--52 transitions per layer while threshold methods at 5$\sigma$ detect 0.00-0.09. Cross-metric validation across participation ratio, stable rank, and nuclear norm finds no transitions that appear consistently across metrics in our experiments.
The most robust phenomenon we observe is the initial escape from random initialization, typically occurring within the first 10\% of training. Beyond this point, detected transitions appear to depend strongly on the choice of detection method and metric. While architecture-specific patterns emerge within each method, the lack of agreement across methods and metrics raises
important questions about the interpretation of phase transitions in neural network training.
Our findings suggest that current detection methods cannot reliably identify phase transitions in models at the scales we studied, with training dynamics exhibiting predominantly continuous evolution beyond initialization. We propose practical guidelines for practitioners that embrace continuous monitoring approaches and discuss the implications for understanding neural network optimization. This work highlights the importance of methodological scrutiny when characterizing training dynamics and suggests that multiple perspectives—both continuous and discrete—may be needed to fully understand how neural networks learn.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Ruoyu_Sun1
Submission Number: 6615
Loading