Defending Against Localized Adversarial Attacks on Edge-Deployed Monocular Depth Estimators

Nanda H. Krishna; Praveen Kumar R; Rishi Vardhan K; Vineeth Vijayaraghavan

Defending Against Localized Adversarial Attacks on Edge-Deployed Monocular Depth Estimators

Nanda H. Krishna, Praveen Kumar R, Rishi Vardhan K, Vineeth Vijayaraghavan

Published: 01 Jan 2020, Last Modified: 13 May 2025ICMLA 2020EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Estimation of depth from a single image is an important scene understanding task in computer vision. With the advent of Deep Learning and Convolutional Neural Networks, staggeringly high accuracies have been achieved in this task. With advancements in model optimization, it has been possible to deploy these models on edge devices, allowing for efficient depth estimation in safety-critical applications in robots, rovers, drones and even self-driving vehicles. However, these models are susceptible to attacks from malicious adversaries, which aim to distort the output of the model for a seemingly clean image by adding minute perturbations. In the real-world scenario, the most plausible attack is the adversarial patch, which can be printed and used as a physical adversarial attack against Deep Learning models. In the case of Monocular Depth Estimation, we show that small adversarial patches, which range from 0.7% to 5% of the image size, greatly worsen model performance. It is thus essential that these models are made robust using defense mechanisms, to defend against malicious inputs while also not reducing performance on clean images. Moreover, it is essential that the defense mechanism be computationally efficient, for real-time inference on edge devices. In this work, we propose the first defense mechanism against adversarial patches for a regression network, in the context of Monocular Depth Estimation on an edge device. The defense mechanism adds very little overhead time of 38 milliseconds on a Raspberry Pi 3 Model B, maintaining performance on clean images while also achieving near clean image levels of performance on adversarial inputs.

Loading