Abstract: We present a theoretical paradigm that extends classical information theory to finite and structured systems by redefining \emph{redundancy} as a fundamental quantity of information organization rather than inefficiency.
Within an $f$-divergence framework, redundancy is formalized as $\mathcal{R}_{f}(X) = D_{f}(P_X \| \Pi_X)
= \mathbb{E}_{\Pi_X}\!\big[f\!\big(\tfrac{p(x)}{\prod_i p_i(x_i)}\big)\big]$,
where $p(x)$ is the joint density of $(X_1,\dots,X_n)$,
$p_i(x_i)$ their marginals, and $f$ a convex kernel defining the geometry of informational dependence.
Different choices of $f$ recover mutual information, $\chi^2$ redundancy, and spectral redundancy as special cases, unifying diverse notions under a single mathematical principle.
This reveals that classical measures are not isolated heuristics but projections of a single redundancy geometry.
The framework shows that redundancy is bounded both above and below, yielding a natural equilibrium $R^{*}$ between over-compression (loss of structure) and over-coupling (collapse).
In contrast to the asymptotic regime where minimizing redundancy optimizes transmission efficiency,
finite, structured systems—where real-world learning operates—achieve maximal stability and generalization near this equilibrium. Thus, redundancy emerges as a \emph{structural information principle}: a self-organizing property that governs how information is coherently structured rather than transmitted. Experiments with masked autoencoders (MAE) serve to \emph{verify and visualize} the theory rather than pursue performance benchmarks. They confirm the predicted equilibrium $R^{*}$, where latent redundancy stabilizes and generalization peaks. Together, these results establish redundancy as a measurable and tunable quantity bridging the asymptotic world of communication and the finite world of learning.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Bernhard_C_Geiger1
Submission Number: 6183
Loading