Zero-Direction Probing: A Linear-Algebraic Framework for Deep Analysis of Large- Language-Model Drift
Abstract: We present Zero-Direction Probing (ZDP), a theoretical framework that characterizes
model drift from null directions of transformer activations, requiring no task labels or output
evaluations. Under explicit assumptions (A1–A6), We prove: (i) the Variance–Leak Theorem
(Thm. 1), (ii) Fisher Null-Conservation (Thm. 3), (iii) a Rank–Leak bound for low-rank
updates (Thm. 5), and (iv) a logarithmic-regret guarantee for online null-space trackers
(Thm. 4). We further derive a Spectral Null-Leakage (SNL) metric with a non-asymptotic
Laurent–Massart tail bound and an MP-edge–style concentration inequality, providing a-
priori thresholds for drift under a Gaussian null model. Together, these results establish
that “listening to silence”—monitoring the right/left null spaces of layer activations and
their Fisher geometry—yields concrete, testable guarantees on representational change. The
manuscript is intentionally theory-only; empirical validation and benchmarking are deferred
to companion work.
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=MLo0rUqkHz¬eId=MLo0rUqkHz
Changes Since Last Submission: fixed template including margin issues etc,
Assigned Action Editor: ~Sebastian_U_Stich1
Submission Number: 5724
Loading