Diverse, Difficult, and Odd Instances (D2O): A New Test Set for Object Classification

ali borji

Diverse, Difficult, and Odd Instances (D2O): A New Test Set for Object Classification

ali borji

22 Sept 2022 (modified: 22 Jun 2025)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone

Keywords: object recognition, deep learning, model evaluation, tagging, generalization, out of distribution generalization

TL;DR: We propose a new test set for object recognition and test a variety of object recognition and tagging models on it. We should that models fails drastically on our test set.

Abstract: Test sets are an integral part of evaluating models and gauging progress in object recognition, and more broadly in computer vision and AI. Existing test sets for object recognition, however, suffer from shortcomings such as bias towards the ImageNet characteristics and idiosyncrasies (e.g. ImageNet-V2), being limited to certain types of stimuli (e.g. indoor scenes in ObjectNet), and underestimating the model performance (e.g. ImageNet-A). To mitigate these problems, here we introduce a new test set, called D2O, which is sufficiently different from existing test sets. Images are diverse, unmodified, and representative of real-world scenarios and cause state-of-the-art models to misclassify them with high confidence. To emphasize generalization, our dataset by design does not come paired with a training set. It contains 8,060 images spread across 36 categories, out of which 29 appear in ImageNet. The best Top-1 accuracy on our dataset is around 60% which is much lower than 91% best Top-1 accuracy on ImageNet. We find that popular vision APIs perform very poorly in detecting objects over D2O categories such as “faces”, “cars”, and “cats”. Our dataset also comes with a “miscellaneous” category, over which we test the image tagging algorithms. Overall, our investigations demonstrate that the D2O test set has the right level of difficulty and is predictive of the average-case performance of models. It can challenge object recognition models for years to come and can spur more research in this fundamental area. Data and code are publicly available at [Masked].

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Infrastructure (eg, datasets, competitions, implementations, libraries)

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 3 code implementations](https://www.catalyzex.com/paper/diverse-difficult-and-odd-instances-a-new/code)

5 Replies

Loading