Redundancy as a Structural Coordinate in Representation Learning

Redundancy as a Structural Coordinate in Representation Learning

TMLR Paper6183 Authors

12 Oct 2025 (modified: 01 Feb 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: We develop a structural perspective on redundancy in learned representations, treating \emph{redundancy} as a quantitative property of dependence organization rather than merely as inefficiency. We define redundancy as an $f$-divergence between a joint distribution and the product of its marginals, yielding a unified functional that recovers classical quantities such as mutual information and $\chi^2$-type dependence as special cases. We establish basic bounds and regularity properties of this functional, and we give a model-based endpoint argument showing that, under competing efficiency and robustness pressures, the attainable \emph{downstream} risk profile can admit an interior optimum at a nonzero redundancy level (i.e., neither minimizing nor maximizing redundancy is optimal under the model assumptions). Empirically, we conduct controlled sweeps with masked autoencoders (MAE), organizing outcomes by a \emph{realized} redundancy coordinate computed on frozen probe features, and we report linear-probe accuracy together with proxy-consistency checks across multiple redundancy diagnostics, including a spectral effective-rank statistic derived from covariance geometry. Together, our results- -within our controlled MAE-based study---support redundancy as a measurable coordinate for analyzing representation organization in finite learning systems.

Submission Length: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Bernhard_C_Geiger1

Submission Number: 6183

Loading