Explainability to the Rescue: A Pattern-Based Approach for Detecting Adversarial Attacks

Sanjay Das, Shamik Kundu, Kanad Basu

Published: 01 Jan 2024, Last Modified: 30 Sept 2024HOST 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The widespread adoption of deep neural networks (DNNs) has resulted in their integration into critical autonomous systems. However, recent studies have emphasized the vulnerability of DNNs to fault injection attacks, specifically adversarial weight bit-flips. These attacks exploit memory systems to manipulate network parameters to compromise the performance of DNNs. Consequently, it is crucial to develop efficient online detectors to ensure the reliable and secure operation of DNNs on edge devices. In this paper, we propose a functional pattern-based approach, that efficiently detects attacks on DNNs. The methodology employs functional test patterns crafted by a post-hoc black-box model explanation method for executing run-time integrity checks on the model. To the best of our knowledge, this is the first piece of work that utilizes explainability to improve the security of edge DNN architectures in mission mode. The effectiveness of the proposed approach has been validated through extensive experiments conducted using various standard model-dataset configurations, underscoring its broad applicability. The results indicate that the suggested technique can attain a detection rate of up to 100 % with zero false positives using as few as a single pattern.