The data manifold under the microscope

19 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: data manifold, manifold learning, generalization bounds controlled datasets deep learning theory
Abstract: A significant gap exists between theory and practice in deep learning. One example is given by generalization and approximation error bounds, which are often derived for overly simplified models or yield guarantees that are too loose to be informative. Many such bounds rely on the manifold hypothesis and depend on geometric regularity properties, including intrinsic dimension, curvature, and reach of the data manifold or target functions. To make progress on improving these bounds, one needs detailed insight into data manifold geometry and suitable benchmarks on simple datasets. However, existing datasets and analysis tools typically fall into two extremes: analytically defined manifolds with precisely known geometry but limited realism, or real-world datasets where bounds are assessed only through downstream performance and geometric properties can be estimated only coarsely and with hard-to-quantify error. To address this lack of simple yet realistic datasets and accompanying geometric tools, we introduce a benchmarking framework for studying data geometry. We repurpose and extend the dSprites and COIL-20 datasets with additional transformation dimensions and finer sampling resolution. This enables accurate finite-difference estimates of geometric quantities such as curvature, reach, and volume, yielding a flexible benchmark for evaluating manifold learning methods. As illustrative applications, we assess two established manifold learning bounds by Genovese et al. and Fefferman et al., and analyze how manifold geometry evolves across network layers in $\beta$-VAEs. Our results highlight both the limitations of existing bounds and the value of controlled benchmarks for guiding future theoretical developments.
Supplementary Material: zip
Primary Area: learning theory
Submission Number: 14642
Loading