Keywords: Depth estimation, adversarial attack
TL;DR: A method to find camera viewpoints where monocular depth estimation algorithms fail.
Abstract: Monocular depth estimation models have advanced significantly in recent years, where it seems as if they can provide accurate depth information for any arbitrary scene. In this work, we develop a framework to see if this indeed is true by stress-testing them in different indoor environments. Specifically, our goal is to study how robust various models are to changes in camera viewpoint. Rather than conducting an exhaustive search over all possible viewpoints in a scene, we employ adversarial attacks leveraging a differentiable rendering framework applied to 3D assets. By initializing from a given camera position, we optimize the camera’s rotation and translation parameters through backpropagation to update prediction errors. To ensure meaningful failure cases, we implement strategies that prevent trivial adversarial shortcuts. To make all of this possible, we also construct a dataset comprising of complex, efficiently renderable 3D assets, enabling rigorous evaluation of four recently published depth estimation models. The key insight from our experiments is that all of those models, including the very recent state-of-the-art, fail on the adversarial viewpoints discovered through our framework, i.e., their predictions deviate significantly from the ground-truth depth. Our work establishes a new robustness benchmark for monocular depth estimation task.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 14926
Loading