- Abstract: The contribution of each sample during model training varies across training iterations and the model's parameters. We define the concept of sample importance as the change in parameters induced by a sample. In this paper, we explored the sample importance in training deep neural networks using stochastic gradient descent. We found that "easy" samples -- samples that are correctly and confidently classified at the end of the training -- shape parameters closer to the output, while the "hard" samples impact parameters closer to the input to the network. Further, "easy" samples are relevant in the early training stages, and "hard" in the late training stage. Further, we show that constructing batches which contain samples of comparable difficulties tends to be a poor strategy compared to maintaining a mix of both hard and easy samples in all of the batches. Interestingly, this contradicts some of the results on curriculum learning which suggest that ordering training examples in terms of difficulty can lead to better performance.
- Conflicts: cs.unc.edu
- Keywords: Deep learning, Supervised Learning