Geon3D: Exploiting 3D Shape Bias towards Building Robust Machine Vision

Yutaro Yamada; Yuval Kluger; Sahand Negahban; Ilker Yildirim

Geon3D: Exploiting 3D Shape Bias towards Building Robust Machine Vision

Yutaro Yamada, Yuval Kluger, Sahand Negahban, Ilker Yildirim

29 Sept 2021 (modified: 13 Feb 2023)ICLR 2022 Conference Withdrawn SubmissionReaders: Everyone

Keywords: robustness, common corruptions, adversarial examples, robust vision, vision science

Abstract: Robustness research in machine vision faces a challenge. Many variants of ImageNet-scale robustness benchmarks have been proposed, only to reveal that current vision systems fail under distributional shifts. Although aiming for higher robustness accuracy on these benchmarks is important, we also observe that simply using larger models and larger training datasets may not lead to true robustness, demanding further innovation. To tackle the problem from a new perspective, we encourage closer collaboration between the robustness and 3D vision communities. This proposal is inspired by human vision, which is surprisingly robust to environ-mental variation, including both naturally occurring disturbances (e.g., fog, snow, occlusion) and artificial corruptions (e.g., adversarial examples). We hypothesize that such robustness, at least in part, arises from our ability to infer 3D geometry from 2D retinal projections---the ability to go from images to their underlying causes, including the 3D scene. In this work, we take a first step toward testing this hypothesis by viewing 3D reconstruction as a pretraining method for building more robust vision systems. We introduce a novel dataset called Geon3D, which is derived from objects that emphasize variation across shape features that the human visual system is thought to be particularly sensitive. This dataset enables, for the first time, a controlled setting where we can isolate the effect of “3D shape bias” in robustifying neural networks, and informs new approaches for increasing robustness by exploiting 3D vision tasks. Using Geon3D, we find that CNNs pretrained on 3D reconstruction are more resilient to viewpoint change, rotation, and shift than regular CNNs. Further, when combined with adversarial training, 3D reconstruction pretrained models improve adversarial and common corruption robustness over vanilla adversarially-trained models. We hope that our findings and dataset will encourage exploitation of synergies between the robustness researchers, 3D computer vision community, and computational perception researchers in cognitive science, paving a way for achieving human-like robustness under complex, real-world stimuli conditions.

Supplementary Material: zip

10 Replies

Loading