Wasserstein Coreset via Sinkhorn Loss

Haoyun Yin; Yixuan Qiu; Xiao Wang

Wasserstein Coreset via Sinkhorn Loss

Haoyun Yin, Yixuan Qiu, Xiao Wang

Published: 12 Feb 2025, Last Modified: 12 Feb 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Coreset selection, a technique for compressing large datasets while preserving performance, is crucial for modern machine learning. This paper presents a novel method for generating high-quality Wasserstein coresets using the Sinkhorn loss, a powerful tool with computational advantages. However, existing approaches suffer from numerical instability in Sinkhorn's algorithm. We address this by proposing stable algorithms for the computation and differentiation of the Sinkhorn optimization problem, including an analytical formula for the derivative of the Sinkhorn loss and a rigorous stability analysis of our method. Extensive experiments demonstrate that our approach significantly outperforms existing methods in terms of sample selection quality, computational efficiency, and achieving a smaller Wasserstein distance.

Submission Length: Long submission (more than 12 pages of main content)

Changes Since Last Submission: 1. Fixed legend in Figure 9 and added required explanation. 2. Fixed typos in references. 3. Fixed other minor format issues.

Code: https://github.com/BoodgionWood/WCSL

Assigned Action Editor: ~Brian_Kingsbury1

Submission Number: 3333

Loading