Keywords: data selection, dynamic data pruning, importance sampling
TL;DR: Efficient dynamic data pruning using Bernoulli sampling weighted by historical statistics
Abstract: Dynamic data pruning, which also known as data importance sampling, has been proposed to improve training efficiency. For the case of sampling with replacement, the optimal sampling distribution to minimize the variance is to sample proportional to the gradient norm, which can be approximated by the gradient norm of the logits from an extra forward pass. However, this could result in repeated samples, which can be an undesirable property. Noticing that most dynamic data pruning methods that avoids repeated samples can be seen as weighted Bernoulli sampling, in this work we study the optimal distribution to reduce its variance. Furthermore, to avoid an extra forward pass, we study the use of historic statistics. We propose the use of exponential moving average and probability smoothing to improve the performance.
Supplementary Material: zip
Primary Area: optimization
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 8998
Loading