Data Pruning: Counting the Frequency of Loss Transition from Above-Average to Below-Average (FATB) During Early Training

Data Pruning: Counting the Frequency of Loss Transition from Above-Average to Below-Average (FATB) During Early Training

ICLR 2026 Conference Submission18713 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Data Pruning, Loss Transition

Abstract: In this paper, we propose a novel data pruning algorithm named FATB, which aims to remove potentially redundant data and inherent noise in the original dataset during model training, thereby identifying a core subset of data rich in informational value necessary for effective model training. To quantify the informational content of individual samples during training, we introduce a method that compares the loss value of each sample with the average loss of all samples, resulting in four fundamental scenarios. These scenarios can be combined to describe the transition process of a sample’s loss relative to the average loss throughout training. The informational value of a sample is then derived based on the influence weights associated with these four scenarios. However, computing the influence weights for all four scenarios requires substantial computational resources. To address this challenge, we approximate the sample transition process using one core scenario—where the loss of a single sample transitions from above the average loss to below it between adjacent phases of model training—to estimate the sample’s informational value. Additionally, since the informational contribution of a single sample may vary across different phases of training, we employ an early stopping iteration to determine the count of such core transitions, thereby obtaining a core subset of data enriched with high informational value for model training. Extensive experimental results demonstrate that the proposed method effectively eliminates redundant and noisy data, significantly enhances model performance when training on smaller target-scale core subsets, and remains effective on large-scale datasets.

Supplementary Material: zip

Primary Area: other topics in machine learning (i.e., none of the above)

Submission Number: 18713

Loading