A Case for Vanilla SWD: New Perspectives on Informative Slices, Sliced-Wasserstein Distances, and Learning Rates
Abstract: The practical applications of Wasserstein distances (WDs) are constrained by their sample and computational complexities. Sliced-Wasserstein distances (SWDs) provide a workaround by projecting distributions onto one-dimensional subspaces, leveraging the more efficient, closed-form WDs for 1D distributions. However, in high dimensions, most random projections become uninformative due to the concentration of measure phenomenon. Although several SWD variants have been proposed to focus on informative slices, they often introduce additional complexity, numerical instability, and compromise desirable theoretical (metric) properties of SWD. Amid the growing literature that focuses on directly modifying the slicing distribution, we revisit the standard, "vanilla" Sliced-Wasserstein through an effective-subspace model and a rescaling view of slice informativeness. We show that, with an effective-subspace-aligned notion of slice informativeness, reweighting all individual slices simplifies in expectation to a single global scaling factor relating ambient-space SWD to effective-subspace SWD. For GD/SGD-style first-order optimization, the same factor appears as a step-size calibration effect. We perform extensive experiments across various machine learning tasks showing that vanilla SWD, when properly calibrated, can often match or surpass the performance of more complex variants while retaining its simplicity and metric structure.
Submission Type: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Alain_Durmus1
Submission Number: 6977
Loading