From Local Explainability to Global Robustness: Improving the Robustness of Machine Learning Models Using Counterfactual Explanations

23 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: Security, Adversarial Learning, Adversarial Robustness, Tabular Data, Explainability
Abstract: Sophisticated new adversarial attacks are being introduced at a rapid rate. Such threats have been accompanied by the creation of a wide variety of defense techniques, including robustness techniques. This paper proposes a novel attack-agnostic robustness method that utilizes the local explainability capabilities of counterfactual explanations (CFE data) to improve the robustness of classical machine learning models trained on structured (tabular) data. In order to defend target models, we induce an auxiliary denoising autoencoder (DAE) with benign and CFE data. The DAE serves as a defense mechanism by denoising the input, which can be benign or adversarial, and reconstructing it into the benign data manifold before it is introduced to the target model. We also suggest four protection mechanisms that utilize our DAE, one of which serves as a preventative approach and does not require any changes to the target model. In the other three protection mechanisms, the target model is induced with benign and CFE data in order to both accurately fit the decision boundaries to various samples and improve the model’s robustness to diverse perturbations. In our evaluation on three structured datasets, the proposed robustness method achieved results comparable to state-of-the-art robustness techniques which are not attack-agnostic.
Primary Area: general machine learning (i.e., none of the above)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7865
Loading