Beating a Dead Horse: On the Redundancies of Adversarial Evaluation of Classifiers

Beating a Dead Horse: On the Redundancies of Adversarial Evaluation of Classifiers

TMLR Paper5838 Authors

07 Sept 2025 (modified: 21 Sept 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Creating secure systems is challenging; Defenders have to be right all of the time, but attackers only need to be right once. Thus, security evaluations need to employ a variety of attack strategies to identify gaps in the system's defensive posture. In Machine Learning (ML), we often focus our security evaluations on the model, evaluating as many known attacks as possible or using an assumed representative ensemble of attacks to ensure coverage across many possible attack scenarios. However, it is not uncommon for evaluators, e.g., reviewers of a defense proposal, to be presented with a security evaluation resulting from an attack ensemble and still request additional attack evaluations. In this paper, we study the effectiveness of additional evaluations and re-examine the efficiency of current adversarial robustness evaluation approaches for classification models. Although security evaluations have become increasingly costly due to the increased model scale and dataset size, defensive evaluations still involve running numerous attacks. Even when reviewing an evaluation, additional evaluations may be requested. There is safety in numbers, and what if additional attacks reveal a lack of diversity in the attack scenarios explored by the original evaluation? We examine the question of: "How much more information is learned about the robustness of a defense after the first attack evaluation?". Through three possible lenses of attack diversity, we show that both gradient-based and gradient-free attacks lack any notable variation within their respective classes. A single well-performing attack from each attack class is enough to make a general determination of robustness. When compared to a state-of-the-art and widely used four-attack ensemble, AutoAttack, the simple two-attack ensemble, consisting of one high-performing attack of each class, only differs in evaluation precision by 0.79%.

Submission Length: Long submission (more than 12 pages of main content)

Assigned Action Editor: ~Haoliang_Li2

Submission Number: 5838

Loading