- Abstract: As deep CNN classifier performance using ground-truth labels has begun to asymptote at near-perfect levels, a key aim for the field is to extend training paradigms to capture further useful structure in natural image data and improve model robustness and generalization. In this paper, we present a novel natural image benchmark for making this extension, which we call CIFAR10H. This new dataset comprises a human-derived, full distribution over labels for each image of the CIFAR10 test set, offering the ability to assess the generalization of state-of-the-art CIFAR10 models, as well as investigate the effects of including this information in model training. We show that classification models trained on CIFAR10 do not generalize as well to our dataset as it does to traditional extensions, and that models fine-tuned using our label information are able to generalize better to related datasets, complement popular data augmentation schemes, and provide robustness to adversarial attacks. We explain these improvements in terms of better empirical approximations to the expected loss function over natural images and their categories in the visual world.
- Keywords: image classification, human experiments, risk minimization
- TL;DR: improving classifiers using human uncertainty measurements