Growing ObjectNet: Adding speech, VQA, occlusion, and measuring dataset difficultyDownload PDF

Published: 12 Jul 2022, Last Modified: 05 May 2023Shift Happens 2022 PosterReaders: Everyone
Abstract: Building more difficult datasets is largely an ad-hoc enterprise, generally relying on scale from the web or focusing on particular domains thought to be challenging. ObjectNet is an attempt to create a more difficult dataset, one that eliminates biases that may artificially inflate machine performance, in a systematic way. ObjectNet images are meant to decorrelate objects from their backgrounds, have randomized object orientations, and randomized viewpoints. ObjectNet appears to be much more difficult for machines. Spoken ObjectNet is a retrieval benchmark constructed from spoken descriptions of ObjectNet images. These descriptions are being used to create a captioning and VQA benchmark. In each case large performance drops were seen. The next variant of ObjectNet will focus on real-world occlusions since it is suspected that models are brittle when shown partially-occluded objects. Using large-scale psychophysics on ObjectNet we have constructed a new objective difficulty benchmark applicable to any dataset: the minimum presentation time for an image before the object contained within it can be reliably recognized by humans. This difficulty metric is well predicted by quantities computable from the activations of models, although not necessarily their ultimate performance. We hope that this suite of benchmarks will enable more robust models, prove better images for neuroscientific and behavioral experiments, and contribute to a systematic understanding of the dataset difficulty and progress in computer vision.
Submission Type: Extended abstract
Co Submission: Yes, I am also submitting to the NeurIPS dataset and benchmark track.
0 Replies

Loading