Keywords: Data pruning, Deep learning
TL;DR: How convexity-based theory can fail in practice: the case of a data-pruning algorithm
Abstract: Data pruning consists of identifying a subset of the training set that can be used for training instead of the full dataset. This pruned dataset is often chosen to satisfy some desirable properties. In this paper, we leverage some existing theory on importance sampling with Stochastic Gradient Descent (SGD) to derive a new principled data pruning algorithm based on Lipschitz properties of the loss function. The goal is to identify a training subset that accelerates training (compared to e.g. random pruning). We call this algorithm $\texttt{LiPrune}$. We illustrate cases where $\texttt{LiPrune}$ outperforms existing methods and show the limitations and failures of this algorithm in the context of deep learning.
0 Replies
Loading