Look but Don’t Touch: Gradient Informed Selection Training

Published: 27 Aug 2025, Last Modified: 01 Oct 2025LIMIT 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: data efficient training, model training performance
Abstract: The amount of data available for training foundation models is far greater than our amount of compute. In many domains, this will likely always be the case. Further, not all data is equally valuable for learning, and the learning value of data changes over the course of training. To optimize learning in this setting, several active data selection methods have been proposed; however, they either incur significant additional computational costs or offer limited performance benefits. We propose Gradient Informed Selection Training (GIST), an active data selection method that selects a core subset of examples from mini-batches based on their gradient alignment with a small, fixed holdout set taken from the training set. At each training step, GIST computes per-example gradients and selects only those that are most aligned with the holdout gradient, thereby guiding model updates toward better generalization. On the large, noisy web-scraped image dataset Clothing-1M, GIST trains in 3x faster wall clock time, using 6x fewer steps, and achieves 4% higher final accuracy than RHO-LOSS and uniform data selection.
Submission Number: 14
Loading