Universal and Efficient Detection of Adversarial Data through Nonuniform Impact on Network Layers

Published: 20 Jun 2025, Last Modified: 20 Jun 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Deep Neural Networks (DNNs) are notoriously vulnerable to adversarial input designs with limited noise budgets. While numerous successful attacks with subtle modifications to original input have been proposed, defense techniques against these attacks are relatively understudied. Existing defense approaches either focus on improving DNN robustness by negating the effects of perturbations or use a secondary model to detect adversarial data. Although equally important, the attack detection approach, which is studied in this work, provides a more practical defense compared to the robustness approach. We show that the existing detection methods are either ineffective against the state-of-the-art attack techniques or computationally inefficient for real-time processing. We propose a novel universal and efficient method to detect adversarial examples by analyzing the varying degrees of impact of attacks on different DNN layers. Our method trains a lightweight regression model that predicts deeper-layer features from early-layer features, and uses the prediction error to detect adversarial samples. Through theoretical arguments and extensive experiments, we demonstrate that our detection method is highly effective, computationally efficient for real-time processing, compatible with any DNN architecture, and applicable across different domains, such as image, video, and audio.
Submission Length: Regular submission (no more than 12 pages of main content)
Code: https://github.com/furkanmumcu/Layer-Regression
Assigned Action Editor: ~Tim_Genewein1
Submission Number: 4597
Loading