Noise-free Loss Gradients: A Surprisingly Effective Baseline for Coreset Selection

Saumyaranjan Mohanty; Chimata Anudeep; Konda Reddy Mopuri

Noise-free Loss Gradients: A Surprisingly Effective Baseline for Coreset Selection

Saumyaranjan Mohanty, Chimata Anudeep, Konda Reddy Mopuri

Published: 12 May 2025, Last Modified: 12 May 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: The exponential rise in size and complexity of deep learning models and datasets have resulted in a considerable demand for computational resources. Coreset selection is one of the methods to alleviate this rising demand. The goal is to select a subset from a large dataset to train a model that performs almost at par with the one trained on the large dataset while reducing computational time and resource requirements. Existing approaches either attempt to identify remarkable samples (e.g., Forgetting, Adversarial Deepfool, EL2N, etc.) that stand out from the rest or solve complex optimization (e.g., submodular maximization, OMP) problems to compose the coresets. This paper proposes a novel and intuitive approach to efficiently select a coreset based on the similarity of loss gradients. Our method works on the hypothesis that gradients of samples belonging to a given class will point in similar directions during the early training phase. Samples with most neighbours that produce similar gradient directions, in other words, that produce noise-free gradients, will represent that class. Through extensive experimentation, we have demonstrated the effectiveness of our approach in out-performing state-of-the-art coreset selection algorithms on a range of benchmark datasets from CIFAR-10 to ImageNet with architectures of varied complexity (ResNet-18, ResNet-50, VGG-16, ViT).We have also demonstrated the effectiveness of our approach in Generative Modelling by implementing coreset selection to reduce training time for various GAN models (DCGAN, MSGAN, SAGAN, SNGAN) for different datasets (CIFAR-10, CIFAR-100, Tiny ImageNet) while not impacting the performance metrics significantly. Source code is provided at URL.

Submission Length: Long submission (more than 12 pages of main content)

Changes Since Last Submission: **1**. Camera-ready version. Replaced the anonymised code base with the public GitHub repo link.

Video: https://drive.google.com/file/d/1h8XuoCw1BtndH2Tu8lGdmsQ3dkgK9l1q/view?usp=drive_link

Code: https://github.com/ai23resch04001/Noise_free_gradient

Supplementary Material: pdf

Assigned Action Editor: ~Pavel_Izmailov1

Submission Number: 3327

Loading