Learning Compact Representations via Intrinsic Dimension Regularization

Published: 02 Mar 2026, Last Modified: 11 Mar 2026ICLR 2026 Workshop GRaM PosterEveryoneRevisionsBibTeXCC BY 4.0
Track: long paper (up to 8 pages)
Keywords: Intrinsic dimension, representation learning, effective rank, geometric regularization
TL;DR: Explicitly regularizing the effective rank of neural representations enables models to match dropout-level accuracy while learning 3–5× more compact, intrinsically low-dimensional representations with improved generalization guarantees.
Abstract: Neural networks learn representations in high-dimensional spaces, yet effective classification often requires only a fraction of the available dimensions. We introduce Intrinsic Dimension Regularization for Representation Learning (IDRR), a method that explicitly constrains the effective rank of learned representations during training. Using the soft effective rank---computed as the exponential of the Shannon entropy of normalized singular values---we obtain a fully differentiable measure of representation dimensionality that integrates seamlessly with gradient-based optimization. Our approach employs a two-sided regularization loss that prevents both over-expansion and over-compression, maintaining representations within an optimal "Goldilocks zone'' of dimensionality. We demonstrate that IDRR combined with dropout achieves equivalent test accuracy to dropout alone while reducing representation dimensionality by 68--81\% across four benchmark datasets. On MNIST, IDRR+Dropout achieves 96.5\% accuracy with effective rank 12.6, compared to 96.6\% with effective rank 39.7 for standard dropout---a $3.2\times$ compression with no accuracy loss. Similar results hold for CNNs, where IDRR+Dropout achieves 99.2\% accuracy on MNIST with effective rank 15.7 versus 32.0 for dropout alone. We provide theoretical analysis showing that generalization bounds scale with effective rank rather than ambient dimension, yielding a $\sqrt{D/d}$ improvement when $d \ll D$. Geometric visualization reveals that IDRR produces compact representation skeletons with sharp singular value decay (3--4 orders of magnitude by the 15th component) versus the diffuse clouds of standard training. Ablation studies demonstrate robustness to hyperparameter choices across a wide range of regularization strengths and target ranks.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 107
Loading