Keywords: dataset, classification, spurious features, segmentations
Abstract: Deep classifiers are known to rely on spurious features, leading to reduced generalization. The severity of this problem varies significantly by class. We identify $15$ classes in ImageNet with very strong spurious cues, and collect segmentation masks for these challenging objects to form \emph{Hard ImageNet}. Leveraging noise, saliency, and ablation based metrics, we demonstrate that models rely on spurious features in Hard ImageNet far more than in RIVAL10, an ImageNet analog to CIFAR10. We observe Hard ImageNet objects are less centered and occupy much less space in their images than RIVAL10 objects, leading to greater spurious feature reliance. Further, we use robust neural features to automatically rank our images based on the degree of spurious cues present. Comparing images with high and low rankings within a class reveals the exact spurious features models rely upon, and shows reduced performance when spurious features are absent. With Hard ImageNet's image rankings, object segmentations, and our extensive evaluation suite, the community can begin to address the problem of learning to detect challenging objects \emph{for the right reasons}, despite the presence of strong spurious cues.
Author Statement: Yes
URL: mmoayeri.github.io/HardImagenet
TL;DR: A new perspective on classification performance: how can we learn to predict *for the right reasons* when our data is suboptimal (i.e. riddled with spurious cues)
Supplementary Material: pdf
Dataset Url: mmoayeri.github.io/HardImageNet
This page and accompanying github repo contains all code to download the data, evaluate models on the benchmark, and generate plots shown in the paper.
License: CC-0: Creative Commons Public Domain Dedication
Contribution Process Agreement: Yes
In Person Attendance: Yes
24 Replies
Loading