Prescriptive SVD-Inspired Attention via Spectral Energy Retention

TMLR Paper9140 Authors

22 May 2026 (modified: 05 Jun 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Self-attention is central to modern Transformer architectures, but its dense dot-product formulation makes it difficult to identify which internal directions are structurally important and which can be modified without disrupting the model. SVD-Inspired Attention (SVDA) addresses part of this problem by introducing a learned diagonal spectrum into the query-key score interaction, making latent attention directions explicitly inspectable through indicators such as spectral entropy, effective rank, sparsity, alignment, selectivity, and perturbation response. The paper examines the transition from diagnostic interpretation to operational intervention. A diagnosis--intervention--verification framework is proposed, in which the learned SVDA spectrum is used to guide targeted changes to the attention-score operator. The framework treats the learned spectrum as an intervention surface where spectral coefficients can be masked, retained, regularized, or compared across heads according to their role in score formation. This view is instantiated through spectral energy retention, which converts the learned spectrum into a score-structural intervention rule. Experiments on FashionMNIST, CIFAR-10, CIFAR-100, and Food-101 show that low-energy score directions can be removed while preserving accuracy within experimental noise and reducing the score-forming part of the attention operator. The contribution is a controlled demonstration that intrinsic spectral diagnostics can be converted into verified, structurally realizable modifications of attention, rather than remaining post-hoc descriptive indicators.
Submission Type: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Yuheng_Jia1
Submission Number: 9140
Loading