Learning Distances from Data with Normalizing Flows and Score Matching

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: Our work improves estimation of Fermat distances by combining normalizing flows, score-based models, geodesic smoothing, and a new dimension-adapted Fermat distance for better scalability.
Abstract: Density-based distances (DBDs) provide a principled approach to metric learning by defining distances in terms of the underlying data distribution. By employing a Riemannian metric that increases in regions of low probability density, shortest paths naturally follow the data manifold. Fermat distances, a specific type of DBD, have attractive properties, but existing estimators based on nearest neighbor graphs suffer from poor convergence due to inaccurate density estimates. Moreover, graph-based methods scale poorly to high dimensions, as the proposed geodesics are often insufficiently smooth. We address these challenges in two key ways. First, we learn densities using normalizing flows. Second, we refine geodesics through relaxation, guided by a learned score model. Additionally, we introduce a dimension-adapted Fermat distance that scales intuitively to high dimensions and improves numerical stability. Our work paves the way for the practical use of density-based distances, especially in high-dimensional spaces.
Lay Summary: One way to measure how far apart two points are is to calculate the straight-line distance between them. For certain applications, e.g., images, just interpolating a straight line between two points doesn't yield realistic results. In these cases, it makes sense to make a lot of small steps, where each step is a realistic interpolation, and stitch them together into a curved trajectory. Then we can compute the length of this trajectory, where we also weight each step by how unlikely it is, to get a realistic idea of the distance between far away points. We improve a method to compute such a distance, called the Fermat distance, by use of machine learning models called normalizing flows and score matching models. We show how we can greatly improve the accuracy of the estimated distance, even in high-dimensional spaces, where the problem becomes more difficult.
Link To Code: https://github.com/vislearn/Fermat-Distance
Primary Area: General Machine Learning->Representation Learning
Keywords: density-based distance, Fermat distance, Riemannian geometry, representation learning, normalizing flows, score matching
Submission Number: 16182
Loading