Abstract: Deep neural networks have found widespread application in critical fields but remain vulnerable to adversarial attacks. Existing detection methods aim to achieve defense without modifying the model, but they generally struggle with generalization to unseen attacks. To address this limitation, we investigate the underlying principles of max-loss and min-distance adversarial attacks and uncover a strong positive correlation between perturbation magnitude, prediction confidence, and the distance to the decision boundary. Building on this insight, we introduce Adversarial Detection via Adversarial Sensitivity (ADAS), a novel approach that detects adversarial attacks by analyzing the sensitivity of a model's predictions to perturbation magnitude. ADAS estimates the distance to the decision boundary through sensitivity analysis by simulating adversarial attacks on input samples, identifying anomalies indicative of adversarial manipulation. Extensive experiments demonstrate the robustness and generalizability of ADAS across diverse and previously unseen adversarial attack scenarios, establishing its efficacy as a versatile and reliable detection framework.
External IDs:dblp:conf/icmcs/MingYWCGLY25
Loading