Abstract: Adversarial attacks pose severe threats to the integrity of deep neural networks (DNNs), especially in resource-constrained systems where traditional defenses are computationally expensive. While existing defenses in the black-box setting utilize hardware characteristics of adversarial attacks (like Hardware Performance Counter or HPC measurements), these defenses often involve repeating execution of multiple target model inferences to detect the attacks.In this work, we put forth a differing perspective: while detection strategies involving multiple target model inferences appear to be successful in isolation, they have unacceptable and inhibitory requirements. Precisely, we argue that these works require cleaning the micro-architectural state of hardware like the cache and the branch predictor after each inference. This in turn leads to performance degradation of not only the adversarial attack detector, but also of the overall system at large.In this work, we put forth a novel and lightweight detection strategy, MIRAGE, using HPCs that does not require cleaning the micro-architectural state of caches or branch predictors. We train a convolutional neural network (CNN) on these signals to classify inputs as benign or adversarial in a single shot, making our approach practical for online systems, while allowing full use of hardware optimizations for performance uplifts. Experiments on CIFAR-10 and MNIST datasets reveal that our methodology not only detects adversarial samples effectively with greater than 96% accuracy, but also imposes a minimal timing overhead of 60 ms and maintains high throughput. This makes our solution well-suited for embedded and edge-AI scenarios.
External IDs:dblp:conf/iccad/ChatterjeeTMHM25
Loading