RobustBlack: Challenging Black-Box Adversarial Attacks on State-of-the-Art Defenses

TMLR Paper4286 Authors

21 Feb 2025 (modified: 31 Mar 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Although adversarial robustness has been extensively studied in white-box settings, recent advances in black-box attacks (including transfer- and query-based approaches) are primarily benchmarked against weak defenses, leaving a significant gap in the evaluation of their effectiveness against more recent and moderate robust models (e.g., those featured in the Robustbench leaderboard). In this paper, we question this lack of attention from black-box attacks to robust models. We benchmark the effectiveness of recent black-box attacks against both top-performing and standard defense mechanisms, on the ImageNet dataset. Our empirical evaluation reveals the following key findings: (1) the most advanced black-box attacks struggle to succeed even against simple adversarially trained models; (2) robust models that are optimized to withstand strong white-box attacks, such as AutoAttack, also exhibit enhanced resilience against black-box attacks; and (3) robustness alignment between the surrogate models and the target model plays can significantly impact the success rate of transfer-based attacks.
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: In the revised manuscript (15 pages now following the requests), all the comments from reviewers are considered and accommodated, including a rework of the related work methodology for better readability, and additional proof-reading. In particular, we have added new results with larger budgets, and new baselines. We only included in the main revised paper a subset of the figures and results of space limitation, but we provide in the paper's appendices a more detailed analysis and discussion of the results. In the revised manuscript, we used color coding to showcase: - In magenta the new content to the manuscript. - In orange, the content that was reformulated according to the reviews. Overall, our new results confirm the claims of our publication: - Simple adversarial training is sufficient to counter recent SoTA blackbox attacks - Increasing computation budget (iterations and queries) have limited impact against robustified models - TREMBA is the only attacks where increasing epsilon budgets leads a signficant improvement of success rate.
Assigned Action Editor: ~Dit-Yan_Yeung2
Submission Number: 4286
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview