Gradient Manifold Geometry as a Signature for Adversarial Detection

18 Sept 2025 (modified: 27 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Adversarial Detection, Intrinsic Dimensionality, Gradient Manifold Geometry, Adversarial Robustness, Deep Learning, Machine Learning Security
TL;DR: We find that adversarial examples generate surprisingly low-dimensional gradient manifolds, creating a robust geometric fingerprint for detection.
Abstract: Despite their remarkable performance, deep neural networks exhibit a critical vulnerability where small adversarial perturbations can drastically alter predictions, making robust detection paramount for safety-critical applications like autonomous driving. To address this, this paper investigates the geometric properties of a model’s input loss landscape by analyzing the Intrinsic Dimensionality (ID) of the gradient parameters, which quantifies the minimal number of coordinates required to describe data on its underlying manifold. We reveal a distinct and consistent difference in the ID for natural and adversarial data, which forms the basis of our proposed detection method. Our approach is validated across two distinct operational scenarios: in a batch-wise context for identifying malicious data groups on datasets like MNIST and SVHN, and more critically, in the individual-sample setting, where we establish new state-of-the-art results on challenging benchmarks such as CIFAR-10 and MS COCO. Our detector significantly surpasses existing methods against a wide array of attacks, including CW and AutoAttack, achieving detection rates consistently above 92\% on CIFAR-10 and underscoring that intrinsic dimensionality is a powerful fingerprint for adversarial detection across diverse datasets and attack strategies.
Supplementary Material: zip
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 11615
Loading