Sparse-Guard: Sparse Coding-Based Defense against Model Inversion Attacks

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: societal considerations including fairness, safety, privacy
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Privacy Attacks, Model Inversion Attack, Sparse Coding, Attack Defense, Image Reconstruction
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We provide the first study of neural network architectures that are robust to model inversion attacks. We develop a novel sparse coding-based architecture, $Sparse$-$Guard$, that outperforms state-of-the-art defenses.
Abstract: In this paper, we study neural network architectures that are robust to model inversion attacks. It is well-known that standard network architectures are vulnerable to model inversion, where an adversary can reconstruct images or data used to train the network by inspecting the network's output or the intermediate outputs from a single hidden network layer. Surprisingly, very little is known about how a network's architecture contributes to its robustness (or vulnerability). Instead, recent work on mitigating such attacks has focused on injecting random noise into the network layers or augmenting the training dataset with synthetic data. Our main result is a novel sparse coding-based network architecture, $Sparse$-$Guard$, that is robust to model inversion attacks. Three decades of computer science research has studied sparse coding in the context of image denoising, object recognition, and adversarial misclassification settings, but to the best of our knowledge, its connection to state-of-the-art privacy vulnerabilities remains unstudied. However, sparse coding architectures suggest an advantageous means to prevent privacy attacks because they allow us to control the amount of irrelevant private information encoded in a model's intermediate representations in a manner that can be computed efficiently during training, that adds little to the trained model's overall parameter complexity, and that is known to have little effect on classification accuracy. Specifically, we demonstrate that compared to networks trained with state-of-the-art noise-based or data augmentation-based defenses, $Sparse$-$Guard$ networks maintain comparable or higher classification accuracy while degrading state-of-the-art training data reconstructions by a factor of $1.2$ to $16.2$ across a variety of reconstruction quality metrics (PSNR, SSIM, FID) on standard datasets. We also show that $Sparse$-$Guard$ is equally robust to attacks regardless of whether the leaked layer is earlier or later, suggesting it is also an effective defense under novel security paradigms such as Federated Learning.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6812
Loading