Weak-SIGReg: Covariance Regularization for Stable Deep Learning

Published: 02 Mar 2026, Last Modified: 11 Mar 2026ICLR 2026 Workshop GRaM PosterEveryoneRevisionsBibTeXCC BY 4.0
Track: tiny paper (up to 4 pages)
Keywords: Optimization Stability, Regularization, Vision Transformers, Dimensional Collapse
TL;DR: We introduce a geometric regularizer that enforces covariance isotropy to rescue ViTs and deep MLPs from optimization collapse without architectural hacks.
Abstract: Modern neural network optimization relies heavily on architectural priors—such as Batch Normalization and Residual connections—to stabilize training dynamics. Without these, or in low-data regimes with aggressive augmentation, low-bias architectures like Vision Transformers (ViTs) often suffer from optimization collapse. This work adopts Sketched Isotropic Gaussian Regularization (SIGReg), recently introduced in the LeJEPA self-supervised framework, and repurposes it as a general optimization stabilizer for supervised learning. While the original formulation targets the full characteristic function, a computationally efficient variant is derived, Weak-SIGReg, which targets the covariance matrix via random sketching. Inspired by interacting particle systems, representation collapse is viewed as stochastic drift; SIGReg constrains the representation density towards an isotropic Gaussian, mitigating this drift. Empirically, SIGReg recovers the training of a ViT on CIFAR-100 from a collapsed 20.73% to 72.02% accuracy without architectural hacks and significantly improves the convergence of deep vanilla MLPs trained with pure SGD.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 37
Loading