Keywords: dynamic data pruning, efficient training
Abstract: Dynamic data pruning accelerates training by focusing on informative samples. However, comparing importance scores across different model states introduces inconsistency (score context drift), and variable selection rates bias gradient dynamics over time (temporal gradient bias). We introduce RePB (Resolving Pruning Biases), a framework addressing these issues. RePB performs pruning decisions within local windows (short sequences of batches) during training, using loss scores computed with a near-constant model state within each window to ensure valid comparisons. These decisions determine the data subset used in the subsequent training phase. To counteract temporal gradient bias arising from non-uniform sample inclusion, cumulative temporal rescaling reweights sample losses during training based on their historical selection frequency. We provide theoretical grounding for RePB's consistency in score comparison and gradient alignment. Experiments show RePB achieves near-full-dataset accuracy using reduced data (most above 30%) across 16 datasets, 17 models and 13 tasks, offering a robust and scalable approach to efficient deep learning.
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 12892
Loading