MALT Powers Up Adversarial Attacks

Published: 25 Sept 2024, Last Modified: 06 Nov 2024NeurIPS 2024 posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Adversarial Examples, Robustness, Neural Networks, Classification, Adversarial Attacks
TL;DR: We present a novel adversarial attack MALT (Mesoscopic Almost Linear Targeting), which wins over the current SOTA AutoAttack on several datasets and robust models, while being five times faster.
Abstract: Current adversarial attacks for multi-class classifiers choose potential adversarial target classes naively based on the classifier's confidence levels. We present a novel adversarial targeting method, \textit{MALT - Mesoscopic Almost Linearity Targeting}, based on local almost linearity assumptions. Our attack wins over the current state of the art AutoAttack on the standard benchmark datasets CIFAR-100 and Imagenet and for different robust models. In particular, our attack uses a \emph{five times faster} attack strategy than AutoAttack's while successfully matching AutoAttack's successes and attacking additional samples that were previously out of reach. We additionally prove formally and demonstrate empirically that our targeting method, although inspired by linear predictors, also applies to non-linear models.
Primary Area: Safety in machine learning
Submission Number: 6725
Loading