Zero-Direction Probing: A Linear-Algebraic Framework for Deep Analysis of Large- Language-Model Drift

Zero-Direction Probing: A Linear-Algebraic Framework for Deep Analysis of Large- Language-Model Drift

TMLR Paper5724 Authors

24 Aug 2025 (modified: 13 Jan 2026)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: We present Zero-Direction Probing (ZDP), a theoretical framework that characterizes model drift from null directions of transformer activations, requiring no task labels or output evaluations. Under explicit assumptions (A1–A6), We prove: (i) the Variance–Leak Theorem (Thm. 1), (ii) Fisher Null-Conservation (Thm. 3), (iii) a Rank–Leak bound for low-rank updates (Thm. 5), and (iv) a logarithmic-regret guarantee for online null-space trackers (Thm. 4). We further derive a Spectral Null-Leakage (SNL) metric with a non-asymptotic Laurent–Massart tail bound and an MP-edge–style concentration inequality, providing a- priori thresholds for drift under a Gaussian null model. Together, these results establish that “listening to silence”—monitoring the right/left null spaces of layer activations and their Fisher geometry—yields concrete, testable guarantees on representational change. The manuscript is intentionally theory-only; empirical validation and benchmarking are deferred to companion work.

Submission Length: Regular submission (no more than 12 pages of main content)

Previous TMLR Submission Url: https://openreview.net/forum?id=MLo0rUqkHz&noteId=MLo0rUqkHz

Changes Since Last Submission: Dear Reviewers, Thank you for your careful reading of our manuscript and for the thoughtful, constructive feedback provided. We are grateful for the time and effort you invested in the review process. We have revised the paper extensively to address all comments and believe the resulting manuscript is significantly clearer, better positioned, and more practically interpretable. Below we summarize the key changes made in response to your suggestions. Clarification of theoretical assumptions and definitions. We revised the null-space assumptions to explicitly use an 𝜀 ε-null space defined via an SVD cutoff, rather than assuming exact zero-variance directions. Theorem 1 (Variance–Leak) and related bounds were updated to include explicit 𝜀 ε-dependent residual terms, and corresponding thresholds were revised consistently. We also explicitly defined the model-induced distributions used in the analysis and clarified their role in Theorem 2. In addition, we introduced a dedicated “Notation and Abbreviations” subsection that defines activation matrices, null spaces, perturbations, and all probe quantities (NVL, SNL, FNC, BINA) in one place, ensuring that all symbols are defined before use. Fisher Null-Conservation and robustness. We relaxed the exact Fisher-silence assumption in Theorem 3 to an approximate condition ∥ 𝐹 ( ℎ ) 𝑉 0 , ℓ ∥ ≤ 𝛿 𝐹 ∥F(h)V 0,ℓ ∥≤δ F . The theorem and proof were rewritten accordingly, showing that the KL conclusion holds up to a controlled second-order residual. We also clarified that Fisher Null-Conservation (FNC) can be measured empirically using standard Fisher approximations and added discussion interpreting regimes where NVL is small but FNC is large, guiding practitioners in using FNC as a diagnostic tool. Empirical illustration and validation within a theory-first paper. While the paper remains theory-focused, we added a minimal synthetic illustration as a targeted sanity check of a core theoretical prediction. The figure demonstrates that measured null leakage scales with ∑ 𝑖 cos ⁡ 2 𝜃 𝑖 ∑ i cos 2 θ i , as predicted by the Rank–Leak bound (Theorem 4), using controlled rank- 𝑟 r updates with prescribed principal angles. The caption and surrounding text were revised to make clear that this serves as qualitative validation rather than benchmarking. Role of BINA and algorithmic clarity. We clarified the role of the Bidirectional Null-Adversary (BINA), explicitly framing it as a diagnostic probe rather than a theorem-backed guarantee. A new paragraph explains how the BINA score 𝑆 B I N A S BINA should be interpreted as a practical stress test of functional sensitivity along null directions, complementary to NVL/SNL and FNC. We also clarified the distinction between model outputs 𝑓 ( ℎ ) f(h) and the training loss 𝐿 ( ℎ ) L(h) in Algorithm 1, and revised Algorithm 1—particularly Step 8—to explicitly describe the null-constrained adversarial update/evaluation being performed. Positioning, related work, and scope. We substantially rewrote the Related Work section to better position the framework within existing literature, organizing prior work around concrete theoretical gaps (e.g., dominant vs. silent subspaces, covariance vs. information geometry) and explicitly connecting these gaps to our results. We added a short “Theoretical framework and positioning” subsection to clarify how the individual theorems form a coherent framework. To clarify applicability to large language models, we added an explicit subsection grounding the theory in transformer-based LLMs, explaining how activation matrices, null spaces, and low-rank fine-tuning (e.g., LoRA) arise in practice. We also clarified the linear-algebraic scope of the framework while explicitly acknowledging nonlinear effects and positioning them as future work. Finally, we ensured theorem numbering is now consecutive and consistent throughout the manuscript and resolved minor notation ambiguities. We believe these revisions substantially improve the clarity, positioning, and accessibility of the manuscript, and we sincerely thank you for your constructive feedback, which materially strengthened the paper. Sincerely, The Authors

Assigned Action Editor: ~Sebastian_U_Stich1

Submission Number: 5724

Loading