From Local Explainability to Global Robustness: Improving the Robustness of Machine Learning Models Using Counterfactual Explanations
Keywords: Security, Adversarial Learning, Adversarial Robustness, Tabular Data, Explainability
Abstract: Sophisticated new adversarial attacks are being introduced
at a rapid rate. Such threats have been accompanied by the creation of a
wide variety of defense techniques, including robustness techniques. This
paper proposes a novel attack-agnostic robustness method that utilizes
the local explainability capabilities of counterfactual explanations (CFE
data) to improve the robustness of classical machine learning models
trained on structured (tabular) data. In order to defend target models,
we induce an auxiliary denoising autoencoder (DAE) with benign and
CFE data. The DAE serves as a defense mechanism by denoising the
input, which can be benign or adversarial, and reconstructing it into the
benign data manifold before it is introduced to the target model. We also
suggest four protection mechanisms that utilize our DAE, one of which
serves as a preventative approach and does not require any changes to
the target model. In the other three protection mechanisms, the target
model is induced with benign and CFE data in order to both accurately
fit the decision boundaries to various samples and improve the model’s
robustness to diverse perturbations. In our evaluation on three structured
datasets, the proposed robustness method achieved results comparable
to state-of-the-art robustness techniques which are not attack-agnostic.
Primary Area: general machine learning (i.e., none of the above)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7865
Loading