Statistical and Computational Guarantees of Kernel Max-Sliced Wasserstein Distances

Jie Wang; March Boedihardjo; Yao Xie

Statistical and Computational Guarantees of Kernel Max-Sliced Wasserstein Distances

Jie Wang, March Boedihardjo, Yao Xie

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: Statistical and computational guarantees for the kernerlized version of max-sliced Wasserstein distances are provided

Abstract: Optimal transport has been very successful for various machine learning tasks; however, it is known to suffer from the curse of dimensionality. Hence, dimensionality reduction is desirable when applied to high-dimensional data with low-dimensional structures. The kernel max-sliced (KMS) Wasserstein distance is developed for this purpose by finding an optimal nonlinear mapping that reduces data into $1$ dimension before computing the Wasserstein distance. However, its theoretical properties have not yet been fully developed. In this paper, we provide sharp finite-sample guarantees under milder technical assumptions compared with state-of-the-art for the KMS $p$-Wasserstein distance between two empirical distributions with $n$ samples for general $p\in[1,\infty)$. Algorithm-wise, we show that computing the KMS $2$-Wasserstein distance is NP-hard, and then we further propose a semidefinite relaxation (SDR) formulation (which can be solved efficiently in polynomial time) and provide a relaxation gap for the obtained solution. We provide numerical examples to demonstrate the good performance of our scheme for high-dimensional two-sample testing.

Lay Summary: In "Statistical and Computational Guarantees of Kernel Max-Sliced Wasserstein Distances", Wang, Boedihardjo, and Xie introduce a rigorous theoretical and algorithmic framework for the Kernel Max-Sliced (KMS) Wasserstein distance, a flexible and powerful tool for comparing probability distributions in high-dimensional spaces. KMS generalizes the max-sliced Wasserstein distance by replacing linear projections with nonlinear projections in a Reproducing Kernel Hilbert Space (RKHS), capturing nonlinear differences between distributions more effectively. The authors provide dimension-free, finite-sample guarantees for the KMS p-Wasserstein distance, showing it converges at the optimal rate under mild assumptions. On the computational side, they prove that computing the KMS 2-Wasserstein distance is NP-hard and thus develop a tractable semidefinite relaxation (SDR) formulation with provable approximation bounds and efficient first-order optimization algorithms. Notably, they also establish a novel rank bound on the SDR solutions and propose a rank-reduction procedure for improved interpretability and performance. Extensive experiments demonstrate the superior performance of this framework in high-dimensional two-sample testing, human activity change detection, and generative modeling. The KMS Wasserstein distance outperforms various baselines, including MMD, Sinkhorn divergence, and sliced Wasserstein distances, especially when the data exhibits nonlinear structures.

Primary Area: General Machine Learning->Kernel methods

Keywords: Optimal transport, finite-sample guarantees, rank-constrained optimization

Submission Number: 5155

Loading