Measure Before You Look: Grounding Embeddings Through Manifold Metrics

César Miguel Valdez Córdova; Matthew Scicluna; Shuang Ni; Smita Krishnaswamy; Simon Gravel; Guy Wolf

Measure Before You Look: Grounding Embeddings Through Manifold Metrics

César Miguel Valdez Córdova, Matthew Scicluna, Shuang Ni, Smita Krishnaswamy, Simon Gravel, Guy Wolf

Published: 23 Sept 2025, Last Modified: 13 Nov 2025NeurReps 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Manifold learning, Dimensionality reduction, Tangent space approximation, Local intrinsic dimensionality, Representation learning, Geometry-aware regularization

TL;DR: We propose a geometry-aware framework using intrinsic dimensionality metrics to assess embedding quality and show that Jacobian-regularized autoencoders yield embeddings that better preserve the original manifold structure.

Abstract: Dimensionality reduction methods are routinely employed across scientific disciplines to make high dimensional data amenable to analysis. Despite their widespread use, we often lack tools to assess whether their resulting embeddings are faithful to the underlying manifold structure. Without a rigorous quantitative assessment of an embedding's structural properties, it is difficult to quantify their degree of preservation or distortion of the underlying manifold structure of the data. We introduce a complementary suite of geometric metrics to quantitatively audit embedding fidelity across neighborhood sizes: Tangent Space Approximation (TSA), Local Intrinsic Dimensionality (LID), and Participation Ratio (PR). We compare the dimensionality of each sample before and after embedding, where points that preserve similar values across transformations are deemed to be geometrically faithful and thus, representative of true manifold structure in the data. Across synthetic and biological datasets, we show that these metrics expose distinct embedding failure modes: TSA is most sensitive to small-scale geometric distortions, LID captures heterogeneity in mixed-density regions, and PR diagnoses global variance structure. Finally, we demonstrate that applying Jacobian Frobenius penalties during autoencoder refinement of intermediate representations contracts tangent spaces, reduces disagreement between metrics, and improves alignment with intrinsic manifold geometry, as measured by rank correlations to original spaces. We motivate moving beyond visual heuristics and making principled, geometry-based choices to inform method selection, improve representations and motivate geometry-aware objectives for representation learning.

Submission Number: 68

Loading