The data manifold under the microscope

The data manifold under the microscope

ICLR 2026 Conference Submission14642 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: data manifold, manifold learning, generalization bounds controlled datasets deep learning theory

Abstract: A significant gap exists between theory and practice in deep learning. While generalization and approximation error bounds have been proposed, they are often restricted to overly simplified models or result in loose guarantees. Many of these bounds rely on the manifold hypothesis and depend on geometric regularity properties such as intrinsic dimension, curvature, or reach of the data manifold or target functions. However, evaluations of such bounds typically fall into two extremes: either synthetic, analytically defined manifolds where geometric properties are precisely known, or real-world datasets where the bounds are judged solely by downstream performance. Neither approach adequately reveals how data geometry affects the tightness or applicability of the theoretical results. We propose a general-purpose framework for studying data geometry by creating dense, controllable versions of dSprites and COIL-20 with additional transformation dimensions and fine sampling resolution. This setup enables accurate finite-difference estimates of geometric measures such as curvature, reach, and volume, offering a flexible benchmark for analyzing manifold learning methods. As illustrative applications, we evaluate two established manifold learning bounds by Genovese et al. and Fefferman et al., and we examine how manifold geometry evolves across network layers in $\beta$-VAEs. Our results highlight both the limitations of existing bounds and the potential of such controlled benchmarks to guide future theoretical developments.

Primary Area: learning theory

Submission Number: 14642

Loading