Query Efficient Black-Box Adversarial Attack with Automatic Region Selection

21 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Black-box attack, DNNs, Group sparsity
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Deep neural networks (DNNs) have been shown to be vulnerable to black-box attacks in which small perturbations are added to input images without accessing any internal information of the model. However, current black-box adversarial attack methods are limited to attacks on entire regions, pixel-wise sparse attacks, or region-wise attacks. In this paper, we investigate region-wise adversarial attacks in the black-box setting, using automatic region selection and controllable imperceptibility. Technically, we formulate the problem as an optimization problem with $\ell_0^{\mathcal{G}}$ and $\ell_\infty$ constraints. Here, $\ell_0^{\mathcal{G}}$ represents structured sparsity defined on one collection of groups $\mathcal{G}$, which can automatically detect the regions that need to be perturbed. We solve the problem using the algorithm of natural evolution strategies with search gradients. If $\mathcal{G}$ is non-overlapping, we provide a closed-form solution to the first-order Taylor approximation of the objective function with the search gradient having $\ell_0^{\mathcal{G}}$ and $\ell_\infty$ constraints (FTAS$\ell_{0+\infty}^{\mathcal{G}}$). If $\mathcal{G}$ is overlapping, we provide an approximate solution to FTAS$\ell_{0+\infty}^{\mathcal{G}}$ due to its NP-hard nature, using greedy selection on the collection of groups $\mathcal{G}$. Our method consists of multiple updates with the closed-form/approximate solution to FTAS$\ell_{0+\infty}^{\mathcal{G}}$. We provide the convergence analysis of the solution under standard assumptions. Our experimental results on different datasets indicate that we require fewer perturbations compared to global-region attacks, fewer queries compared to region-wise attacks, and better interpretability into vulnerable regions which is not possible with pixel-wise attacks.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: zip
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 3471
Loading