Scalable Approximation Algorithms for $p$-Wasserstein Distance and Its Variants

Nathaniel Lahn; Sharath Raghvendra; Emma Saarinen; Pouyan Shirzadian

Scalable Approximation Algorithms for $p$-Wasserstein Distance and Its Variants

Nathaniel Lahn, Sharath Raghvendra, Emma Saarinen, Pouyan Shirzadian

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: The $p$-Wasserstein distance measures the cost of optimally transporting one distribution to another, where the cost of moving a unit mass from $a$ to $b$ is the $p^{th}$ power of the ground distance $\mathrm{d}(a,b)$ between them. Despite its strong theoretical properties, its use in practice -- especially for $p \ge 2$ -- is limited due to two key challenges: sensitivity to noise and a lack of scalable algorithms. We identify noise sensitivity as a key reason why some existing approximation algorithms for $p=1$ fail to generalize to $p \ge 2$ and then present new algorithms for approximating the $p$-Wasserstein distance and its variant. First, when $\mathrm{d}(\cdot,\cdot)$ is a metric, for any constant $p \ge 2$, we present a novel relative $O(\log n)$-approximation algorithm to compute the $p$-Wasserstein distance between any two discrete distributions of size $n$. The algorithm runs in $O(n^2 \log U\log \Delta\log n)$ time, where $\log U$ is the bit-length of the input probabilities and $\Delta$ is the ratio of the largest to the smallest pairwise distance. We use $p$ hierarchically well-separated trees to define a distance that approximates the $p$-Wasserstein cost within a factor of $O(\log n)$ and then present a simple primal-dual algorithm to compute the $p$-Wasserstein cost with respect to this distance. Second, due to the noise sensitivity of the $p$-Wasserstein distance, we show that existing combinatorial approaches require $\Omega(n^2/\delta^p)$ time to approximate the $p$-Wasserstein distance within an additive error of $\delta$. In contrast, we show that, for any arbitrary distance $\mathrm{d}(\cdot,\cdot)$, a recent noise-resistant variant of the $p$-Wasserstein distance, called the $p$-RPW distance, can be approximated in $O(n^2/\delta^3)$ time.

Lay Summary: The $p$-Wasserstein distance is a mathematical tool for measuring the similarity between two probability distributions. It quantifies the effort required to transform one distribution into another by moving probability mass, where the cost of moving a unit mass is given by the $p^{\text{th}}$ power of the ground distance between points. Despite its strong theoretical foundations, its practical use—especially for $p \geq 2$—is limited by high computational costs, largely due to its high sensitivity to noise. This research explains why algorithms that perform well for $p=1$ often fail to scale to higher values of $p$ and introduces new algorithms to address these challenges. The first algorithm offers a provably accurate approximation of the $p$-Wasserstein distance by using hierarchical, graph-based structures to efficiently approximate distances. The second part of the work shows that while traditional methods become prohibitively slow for higher values of $p$ due to increased sensitivity to noise, a newer and more robust variant—the $p$-RPW distance—can be approximated significantly faster, making it a practical alternative in such scenarios.

Primary Area: Optimization->Discrete and Combinatorial Optimization

Keywords: p-Wasserstein Distance, Optimal Transport, Approximation Algorithms

Submission Number: 11627

Loading