Shake the k-Center: Toward Noise-Robust Coresets via Reliability Swapping between Neighbors

ICLR 2026 Conference Submission15074 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: coreset; label noise
Abstract: Coreset selection aims to select a small high-quality subset from a large-scale dataset to support DNN downstream tasks. k-Center is a solid coreset selection approach with a theoretical guarantee. It considers coresets from the covering theory: In the feature space, a coreset can cover all data with a sphere centered on each sample in the coreset. Smaller covering radii indicate better quality. However, the performance of k-center degrades and lags behind other methods on noisy datasets. To the best of our knowledge, there is still a lack of explanation for this phenomenon. We propose a theory for this phenomenon. Our theory indicates that the noise rate of the coreset constrains the generalization performance of the selected subset. With this theory, we propose a coreset selection method under label noise, named Shaker, whose core idea is to jointly optimize the set cover and reliability of the coreset. Shaker first generates a batch of candidates with a small covering radius and then swaps in their reliable neighbors while maintaining a good set cover. Extensive results demonstrate that Shaker outperforms baseline methods by up to 14.3%.
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 15074
Loading