- Abstract: Deep Neural Networks (DNNs) generalize well despite their massive size and capability of memorizing all examples. There is a hypothesis that DNNs start learning from simple patterns based on the observations that are consistently well-classified at early epochs (i.e., easy examples) and examples misclassified (i.e., hard examples). However, despite the importance of understanding the learning dynamics of DNNs, properties of easy and hard examples are not fully investigated. In this paper, we study the similarities of easy and hard examples respectively among different CNNs, assessing those examples’ contributions to generalization. Our results show that most easy examples are identical among different CNNs, as they share similar dataset-dependent patterns (e.g., colors, structures, and superficial cues in high-frequency). Moreover, while hard examples tend to contribute more to generalization than easy examples, removing a large number of easy examples leads to poor generalization, and we find that most misclassified examples in validation dataset are hard examples. By analyzing intriguing properties of easy and hard examples, we discover that the reason why easy and hard examples have such properties can be explained by biases in a dataset and Stochastic Gradient Descent (SGD).
- Keywords: easy examples, hard example, CNN
- TL;DR: Unknown properties of easy and hard examples are shown, and they come from biases in a dataset and SGD.