Leveraging image representations for bounded adversarial attacks and robustness

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: general machine learning (i.e., none of the above)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Invertible image transforms, Reversible generative models, DCT, DWT, Bounded white-box attacks, Adversarial training, Corruption robustness.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Both classical and learned image transformations such as the discrete wavelet transforms (DWTs) and flow-based generative models provide semantically meaningful representations of images. In this paper, we propose a general method for robustness exploiting the expressiveness of image representations by targeting substantially low-dimensional subspaces inside the $L^\infty$ box. Experiments with DCT, DWTs and Glow produce adversarial examples that are significantly more similar to the original than those found considering the full $L^\infty$ box. Further, through adversarial training we show that robustness under the introduced constraints transfers better to robustness against a broad class of common image perturbations compared to the standard $L^\infty$ box, without a major sacrifice of natural accuracy.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7947
Loading