Mitigating Privacy Risk of Adversarial Examples with Counterfactual Explanations

28 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Adversarial Examples, Privacy, Counterfactual Explanations
TL;DR: We mitigated privacy risks of adversarial examples and used counterfactual explanation method to generate adversarial examples for the first time.
Abstract: Robustness and privacy are two fundamental security properties that machine learning models require. Without the balance between robustness and privacy leads to robust models with high privacy risks. Obtaining machine learning models with high adversarial robustness and privacy performance remains an open problem. In order to enhance the privacy performance of robust models, we employ counterfactual explanations as a method to mitigate privacy risks while concurrently maintaining robust model accuracy, reducing the privacy risk of the robust model to the level of random guessing and using counterfactual explanations to generate adversarial examples for the first time. We analyze the similarities and differences between adversarial examples and counterfactual explanations and utilize these properties to design the generation method. We conduct an in-depth analysis of the advantages offered by counterfactual explanations compared to traditional adversarial examples. Our study indicates that the correlation between robustness and privacy is strong and the ideal balance state of accuracy, robustness, and privacy is with 95\% adversarial examples involved in model training.
Supplementary Material: zip
Primary Area: interpretability and explainable AI
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 14241
Loading