Evaluating the Adversarial Robustness of CNNs Layer by Layer

TMLR Paper5975 Authors

23 Sept 2025 (modified: 01 Dec 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: In order to measure the adversarial robustness of a feature extractor, Bhagoji et al. introduced a distance on example spaces measuring the minimum perturbation of a pair of examples to achieve identical feature extractor outputs. They related these distances to the best possible robust accuracy of any classifier using the feature extractor. By viewing initial layers of a neural network as a feature extractor, this provides a method of attributing adversarial vulnerability of the classifier as a whole to individual layers. However, this framework views any injective feature extractor as perfectly robust: any bad choices of feature representation can be undone by later layers. Thus the framework attributes all adversarial vulnerabilities to the layers that perform dimensionality reduction. Feature spaces at intermediate layers of convolutional neural networks are generally much larger than input spaces, so this methodology provides no information about the contributions of individual layers to the overall robustness of the network. We extend the framework to evaluate feature extractors with high-dimensional output spaces by composing them with a random linear projection to a lower dimensional space. This results in non-trivial information about the quality of the feature space representations for building an adversarial robust classifier.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: Edits based on initial review feedback: Moved Related Work section up and added a new citation with discussion. Rewrote parts of Framework section for clarity. Moved definitions new to this paper out of Framework section. New subsection on Computational Complexity. New experiments regarding sample size, variance due to random projection. Improved explanation of FAB experiments and their interpretation. New limitations and broader impact subsections.
Assigned Action Editor: ~Venkatesh_Babu_Radhakrishnan2
Submission Number: 5975
Loading