Keywords: Importance Sampling, Convergence, NeuralNetwork Training
Abstract: Modern-day deep learning models are trained efficiently at scale thanks to the
widespread use of stochastic optimizers such as SGD and ADAM. These optimizers
update the model weights iteratively based on a batch of uniformly sampled
training data at each iteration. However, it has been previously observed
that the training performance and overall generalization ability of the model can be
significantly improved by selectively sampling training data based on an
importance criteria, known as importance sampling. Previous approaches
to importance sampling use metrics such as loss, gradient norm etc. to calculate
the importance scores. These methods either attempt to directly compute these
metric, resulting in increased training time, or aim to approximate these
metrics using an analytical proxy, which typically have inferior training
performance. In this work, we propose a new sampling strategy called
IMPON, which computes importance scores based on an auxiliary
linear model that regresses the loss of the original deep model, given the
current training context, with minimal additional computational cost.
Experimental results show that IMPON is able to achieve a significantly high
test accuracy, much faster than prior approaches.
4 Replies
Loading