Noise-free Loss Gradients: A Surprisingly Effective Baseline for Coreset Selection

TMLR Paper3327 Authors

11 Sept 2024 (modified: 08 Nov 2024)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: The exponential rise in size and complexity of deep learning models and datasets have resulted in a considerable demand for computational resources. Coreset selection is one of the methods to alleviate this rising demand. The goal is to select a subset from a large dataset to train a model that performs almost at par with the one trained on the large dataset while reducing computational time and resource requirements. Existing approaches either attempt to identify remarkable samples (e.g., Forgetting, Adversarial Deepfool, EL2N, etc.) that stand out from the rest or solve complex optimization (e.g., submodular maximization, OMP) problems to compose the coresets. This paper proposes a novel and intuitive approach to efficiently select a coreset based on the similarity of loss gradients. Our method works on the hypothesis that gradients of samples belonging to a given class will point in similar directions during the early training phase. Samples with most neighbours that produce similar gradient directions, in other words, that produce noise-free gradients, will represent that class. Through extensive experimentation, we have demonstrated the effectiveness of our approach in out-performing state-of-the-art coreset selection algorithms on a range of benchmark datasets from CIFAR-10 to ImageNet with architectures of varied complexity (ResNet-18, ResNet-50, VGG-16, ViT). We have also demonstrated the effectiveness of our approach in Generative Modelling by implementing coreset selection to reduce execution time for various GAN models (DCGAN, MSGAN, SAGAN, SNGAN) for different datasets (CIFAR-10, CIFAR-100, Tiny ImageNet) while not impacting the performance metrics significantly.
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: **1.** Modified a spelling mistake in second bullet point in page 1. **2.** Updated Figure 5 and Figure 6.
Assigned Action Editor: ~Pavel_Izmailov1
Submission Number: 3327
Loading