Phase Transitions or Continuous Evolution? Methodological Sensitivity in Neural Network Training Dynamics
Abstract: Recent work on neural network training dynamics often identifies ``transitions'' or ``phase changes'' in weight matrices through rank-based spectral metrics. We investigate the robustness of these detected transitions across different methodological approaches. Analyzing 55 experiments spanning Transformer, CNN, and MLP architectures (30,147 measurement points), we find that transition detection using weight-space spectral metrics shows substantial sensitivity to methodological choices. Varying the detection threshold from 2$\sigma$ to 100$\sigma$ changes total detected transitions by an order of magnitude (25,513 to 1,608). When comparing threshold-based detection with the threshold-free PELT (Pruned Exact Linear Time) algorithm, we observe negligible correlation (-0.029) between methods: PELT identifies 40-52 transitions per layer while threshold methods at 5$\sigma$ detect 0.00-0.09. Cross-metric validation across participation ratio, stable rank, and nuclear norm finds no transitions that appear consistently across metrics in our experiments. Extended analysis of activation-based metrics and loss landscape geometry shows similar methodological sensitivity.
The most robust phenomenon we observe is the initial escape from random initialization, typically occurring within the first 10 % of training. Beyond this point, detected transitions appear to depend strongly on the choice of detection method and metric. While architecture-specific patterns emerge within each method, the lack of agreement across methods and metrics raises important questions about the interpretation of phase transitions detected through these spectral approaches.
Our findings demonstrate that weight-space spectral metrics, as currently applied, cannot reliably identify phase transitions in models at the scales we studied. We characterize why detection methods disagree---threshold methods respond to instantaneous magnitude changes while PELT detects distributional shifts-and propose practical guidelines for practitioners. This work highlights the importance of methodological scrutiny and cross-validation when using spectral methods to characterize training dynamics.
Submission Type: Long submission (more than 12 pages of main content)
Changes Since Last Submission: Revision Note — Paper 6615
Dear Action Editor and Reviewers,
We have revised the manuscript according to feedback from all three reviewers. Below we summarize the principal changes.
1. Extended Metric Analysis (Reviewers xiVz, YLtC)
Both reviewers noted our analysis was limited to weight-space spectral metrics. As committed in our response to Reviewer xiVz, we have extended the analysis to include:
Activation norms (layer-wise activation magnitudes)
Gradient alignment (cosine similarity between consecutive updates)
Loss landscape sharpness (following Keskar et al. 2017)
New Section 3.3 describes these metrics. New Section 4.6 presents results. All extended metrics exhibit identical methodological sensitivity: PELT and threshold methods show near-zero correlation (r ∈ [-0.054, 0.023]), and no transitions appear consistently across metrics. This strengthens our central finding—methodological sensitivity is not specific to spectral metrics but general to transition detection on neural network training trajectories.
2. Mechanistic Explanation of Method Disagreement (All Reviewers)
All three reviewers noted the original submission recorded observation without explanation. We have added Section 5, "Why Detection Methods Disagree," providing mechanistic analysis:
Table 6 quantifies detection characteristics: 94% of threshold detections occur at initialization escape; 96% of PELT detections occur elsewhere
Threshold methods respond to instantaneous magnitude spikes (mean 12.3σ at detection)
PELT responds to subtle distributional shifts (variance ratio 1.12 ± 0.34)
We explain why correlation is near-zero: different temporal sensitivity, baseline inflation, and incompatible null hypotheses
This transforms "methods disagree" into "methods disagree because they measure fundamentally different trajectory features."
3. Scope Clarification (All Reviewers)
All reviewers correctly noted our claims required tighter scoping. Revisions include:
Abstract now explicitly bounds findings to "weight-space spectral metrics" and "the methods we tested"
Section 1.1 clarifies what we examine (spectral metrics, activation metrics, loss sharpness) and what we do not (representation-space similarity metrics, functional behavior, generalization dynamics)
Discussion acknowledges that grokking and double descent manifest in test performance, not necessarily weight-space geometry
Conclusion explicitly states: "This does not resolve whether neural network training exhibits genuine phase structure"
4. Removed Unsupported Claims (Reviewers xiVz, YLtC)
Removed the 80-85% deployment statistic, which lacked adequate support from cited sources
Removed Section 3 (theoretical framework) entirely—the Neyman-Pearson derivation relied on assumptions (i.i.d. Gaussian noise) that do not hold for SGD trajectories
5. PELT Clarification (Reviewers iFYe, xiVz)
Section 3.4.2 now explains the L2 cost function and its relationship to the metrics
Section 4.5 presents comprehensive sensitivity analysis across penalty parameters (β ∈ {0.5, 1, 5, 10, 20, 50, 100})
We explicitly acknowledge PELT is not an "objective reference" but a principled baseline with its own parameter sensitivity
6. Honest Positioning of Contribution (Reviewer YLtC)
Reviewer YLtC questioned community interest. Rather than overclaim, we have revised the conclusion to position this work honestly:
"Why does this matter? Because it prevents a specific class of false conclusions. Researchers using spectral metrics to identify training phases should know their detections are method-dependent artifacts with no cross-metric or cross-method agreement. This does not resolve what training dynamics look like—it establishes that one popular approach to studying them produces unreliable results. This is methodological infrastructure work: necessary groundwork that prevents theories from being built on unstable measurement foundations, even if it does not itself advance understanding of how neural networks learn."
We thank all three reviewers for rigorous engagement. The revised manuscript is more tightly scoped, provides mechanistic explanation for the observed disagreement, extends analysis beyond spectral metrics, and positions its contribution honestly.
Respectfully,
Assigned Action Editor: ~Ruoyu_Sun1
Submission Number: 6615
Loading