Purify Perturbative Availability Poisons via Rate-Constrained Variational Autoencoders

18 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: societal considerations including fairness, safety, privacy
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: perturbative availability poisoning attack, defense, variational autoencoders
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Perturbative availability poisoning attacks seek to maximize testing error by making subtle modifications to training examples that are correctly labeled. Defensive strategies against these attacks can be categorized based on whether specific interventions are adopted during the training phase. The first approach is training-time defense, such as adversarial training, which can effectively mitigate poisoning effects but is computationally intensive. The other approach is pre-training purification, *e.g.,* image short squeezing, which consists of several simple compressions but often encounters challenges in dealing with various poison types. Our work provides a novel disentanglement mechanism to build an efficient pre-training purification method that achieves superior performance to all existing defenses. Firstly, we uncover rate-constrained variational autoencoders (VAEs), demonstrating a clear tendency to suppress poison patterns by minimizing mutual information in the latent space. We subsequently conduct a theoretical analysis to offer an explanation for this phenomenon. Building upon these insights, we introduce a disentangle variational autoencoder (D-VAE), capable of disentangling the added perturbations with learnable class-wise embeddings. Based on this network, a two-stage purification approach is naturally developed. The first stage focuses on roughly suppressing poison patterns, while the second stage produces refined, poison-free results, ensuring the effectiveness and robustness across various scenarios and datasets. Extensive experiments demonstrate the remarkable performance of our method across CIFAR-10, CIFAR-100, and a 100-class ImageNet-subset with multiple poison types and different perturbation levels.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1202
Loading