Abstract: We propose a unified \emph{information–geometric} framework that formalizes understanding in learning as a trade-off between informativeness and geometric simplicity.
An encoder $\phi$ is evaluated by the utility
\[
U(\phi)=I(\phi(X);Y)-\beta\,\mathcal{C}(\phi),
\]
where $I(\phi(X);Y)$ measures task-relevant information and $\mathcal{C}(\phi)$ penalizes curvature and intrinsic dimensionality, promoting smooth, low-complexity manifolds.
Under standard manifold and regularity conditions, we establish non-asymptotic generalization bounds showing that generalization error scales with intrinsic dimension and curvature acts as a stabilizing capacity term linking geometry to sample efficiency.
To operationalize the theory, we introduce the \emph{Variational Geometric Information Bottleneck} (\texttt{V-GIB}), a variational estimator that unifies mutual-information compression with curvature regularization via tractable geometric proxies (Hutchinson trace, Jacobian-norm, and local PCA estimators).
Across synthetic manifolds, few-shot tasks, and real-world datasets (Fashion-MNIST, CIFAR-10), \texttt{V-GIB} exhibits a consistent information–geometry Pareto frontier, estimator stability, and substantial gains in interpretive efficiency.
Fractional-data experiments on CIFAR-10 further confirm the predicted \emph{efficiency–curvature law}, that curvature-aware encoders maintain accuracy under severe data scarcity.
Overall, \texttt{V-GIB} offers a principled and measurable route to representations that are geometrically coherent, data-efficient, and aligned with human-interpretable structure; providing empirical and theoretical evidence for a geometric law of understanding in learning systems.
Submission Length: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Junchi_Yan1
Submission Number: 6381
Loading