SAIF: Sparse Adversarial and Imperceptible Attack Framework

TMLR Paper4435 Authors

10 Mar 2025 (modified: 22 Mar 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Adversarial attacks hamper the decision-making ability of neural networks by perturbing the input signal. For instance, adding calculated small distortions to images can deceive a well-trained image classification network. In this work, we propose a novel attack technique called \textbf{S}parse \textbf{A}dversarial and \textbf{I}mperceptible Attack \textbf{F}ramework (SAIF). Specifically, we design imperceptible attacks that contain low-magnitude perturbations at a few pixels and leverage these sparse attacks to reveal the vulnerability of classifiers. We use the Frank-Wolfe (conditional gradient) algorithm to simultaneously optimize the attack perturbations for bounded magnitude and sparsity with $O(1/\sqrt{T})$ convergence. Empirical results show that SAIF computes highly imperceptible and interpretable adversarial examples, and largely outperforms state-of-the-art sparse attack methods on ImageNet and CIFAR-10.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Daniel_M_Roy1
Submission Number: 4435
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview