A Case for Vanilla SWD: New Perspectives on Informative Slices, Sliced-Wasserstein Distances, and Learning Rates

A Case for Vanilla SWD: New Perspectives on Informative Slices, Sliced-Wasserstein Distances, and Learning Rates

TMLR Paper6977 Authors

12 Jan 2026 (modified: 19 Jan 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: The practical applications of Wasserstein distances (WDs) are constrained by their sample and computational complexities. Sliced-Wasserstein distances (SWDs) provide a workaround by projecting distributions onto one-dimensional subspaces, leveraging the more efficient, closed-form WDs for 1D distributions. However, in high dimensions, most random projections become uninformative due to the concentration of measure phenomenon. Although several SWD variants have been proposed to focus on informative slices, they often introduce additional complexity, numerical instability, and compromise desirable theoretical (metric) properties of SWD. Amid the growing literature that focuses on directly modifying the slicing distribution, which often face challenges, we revisit the standard, "vanilla" Sliced-Wasserstein and propose instead to rescale the 1D Wasserstein to make all slices equally informative. Importantly, we show that with an appropriate notion of slice informativeness, rescaling for all individual slices simplifies to a single global scaling factor on the SWD. This, in turn, translates to the standard learning rate search for gradient-based learning in common ML workflows. We perform extensive experiments across various machine learning tasks showing that vanilla SWD, when properly configured, can often match or surpass the performance of more complex variants.

Submission Type: Long submission (more than 12 pages of main content)

Assigned Action Editor: ~Alain_Durmus1

Submission Number: 6977

Loading