Data-Efficient Training by Evolved Sampling

Published: 10 Oct 2024, Last Modified: 19 Nov 2024AFM 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: learning efficiency, evolved sampling, data selection, loss dynamics
Abstract: Data selection is designed to accelerate learning with preserved performance. To achieve this, a fundamental thought is to identify informative data samples with significant contributions to the training. In this work, we propose \textbf{Evolved Sampling} (\textbf{ES}), a simple yet effective framework for \emph{dynamic} sampling performed along the training process. This method conducts \em batch \em level data selection based on \emph{differences} of historical and current losses, significantly reducing the back propagation time while maintaining the model performance. ES is also readily extensible to incorporate \em set \em level data selection for further training accelerations. As a plug-and-play framework, ES consistently achieves lossless training accelerations across various models, datasets, and optimizers, saving up to 40\% wall-clock time. Particularly, the improvement is more significant under the \emph{noisy supervision} setting. When there are severe corruptions in labels, ES can obtain accuracy improvements of approximately 20\% relative to the standard batched sampling.
Submission Number: 88
Loading