Abstract: We develop a structural perspective on redundancy in learned representations, treating \emph{redundancy} as a quantitative property of dependence organization rather than merely as inefficiency. We define redundancy as an $f$-divergence between a joint distribution and the product of its marginals, yielding a unified functional that recovers classical quantities such as mutual information and $\chi^2$-type dependence as special cases. We establish basic bounds and regularity properties of this functional, and we give a model-based endpoint argument showing that, under competing efficiency and robustness pressures, the attainable \emph{downstream} risk profile can admit an interior optimum at a nonzero redundancy level (i.e., neither minimizing nor maximizing redundancy is optimal under the model assumptions). Empirically, we conduct controlled sweeps with masked autoencoders (MAE), organizing outcomes by a \emph{realized} redundancy coordinate computed on frozen probe features, and we report linear-probe accuracy together with proxy-consistency checks across multiple redundancy diagnostics, including a spectral effective-rank statistic derived from covariance geometry. Together, our results- -within our controlled MAE-based study---support redundancy as a measurable coordinate for analyzing representation organization in finite learning systems.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Bernhard_C_Geiger1
Submission Number: 6183
Loading