The Robustness Limits of SoTA Vision Models to Natural Variation

Mark Ibrahim; Quentin Garrido; Ari S. Morcos; Diane Bouchacourt

The Robustness Limits of SoTA Vision Models to Natural Variation

Mark Ibrahim, Quentin Garrido, Ari S. Morcos, Diane Bouchacourt

22 Sept 2022 (modified: 14 Jan 2026)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone

Keywords: robustness, computer vision, generalization, deep learning

TL;DR: Even today's best vision models are not robust and struggle to generalize changes in factors such as pose, size, and position.

Abstract: Recent state-of-the-art vision models introduced new architectures, learning paradigms, and larger pretraining data, leading to impressive performance on tasks such as classification. While previous generations of vision models were shown to lack robustness to factors such as pose, it’s unclear the extent to which this next generation of models are more robust. To study this question, we develop a dataset of more than 7 million images with controlled changes in pose, position, background, lighting, and size. We study not only how robust recent state-of-the-art models are, but also the extent to which models can generalize variation in factors when they’re present during training. We consider a catalog of recent vision models, including vision transformers (ViT), self-supervised models such as masked autoencoders (MAE), and models trained on larger datasets such as CLIP. We find out-of-the-box, even today’s best models are not robust to common changes in pose, size, and background. When some samples varied during training, we found models required a significant portion of diversity to generalize—though eventually robustness did improve. When diversity is only seen for some classes however, we found models did not generalize to other classes, unless the classes were very similar to those seen varying during training. We hope our work will shed further light on the blind spots of SoTA models and spur the development of more robust vision models

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Social Aspects of Machine Learning (eg, AI safety, fairness, privacy, interpretability, human-AI interaction, ethics)

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/the-robustness-limits-of-sota-vision-models/code)

5 Replies

Loading