Wasserstein Coreset via Sinkhorn Loss

TMLR Paper3333 Authors

13 Sept 2024 (modified: 24 Sept 2024)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Coreset selection, a technique for compressing large datasets while preserving performance, is crucial for modern machine learning. This paper presents a novel method for generating high-quality Wasserstein coresets using the Sinkhorn loss, a powerful tool with computational advantages. However, existing approaches suffer from numerical instability in Sinkhorn's algorithm. We address this by proposing stable algorithms for both forward and backward computations. We further derive an analytical formula for the Sinkhorn loss derivative and rigorously analyze the stability of our method. Extensive experiments demonstrate that our approach significantly outperforms existing methods in terms of sample selection quality, computational efficiency, and achieving a smaller Wasserstein distance.
Submission Length: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Brian_Kingsbury1
Submission Number: 3333
Loading