Keywords: robustness, benchmarking, datasets
TL;DR: We propose a new challenging dataset to benchmark robustness of ImageNet-trained models inspired by domain adaptation
Abstract: We propose a new challenging dataset to benchmark robustness of ImageNet-trained models with respect to domain shifts: ImageNet-D. ImageNet- D has six different domains (“Real”, “Painting”, “Clipart”, “Sketch”, “Infograph” and “Quickdraw”). We show that even state-of-the-art models struggle on this dataset and find that they make well-interpretable errors. For example, our best EfficientNet-L2 model experiences a large performance drop even on the “Real” domain from 11.6% on ImageNet clean to 29.2% on the “Real” domain.