Machine Learning Explainability and Robustness: Connected at the HipOpen Website

2021 (modified: 03 Dec 2021)KDD 2021Readers: Everyone
Abstract: This tutorial examines the synergistic relationship between explainability methods for machine learning and a significant problem related to model quality: robustness against adversarial perturbations. We begin with a broad overview of approaches to explainable AI, before narrowing our focus to post-hoc explanation methods for predictive models. We discuss perspectives on what constitutes a "good'' explanation in various settings, with an emphasis on axiomatic justifications for various explanation methods. In doing so, we will highlight the importance of an explanation method's faithfulness to the target model, as this property allows one to distinguish between explanations that are unintelligible because of the method used to produce them, and cases where a seemingly poor explanation points to model quality issues. Next, we introduce concepts surrounding adversarial robustness, including adversarial attacks as well as a range of corresponding state-of-the-art defenses. Finally, building on the knowledge presented thus far, we present key insights from the recent literature on the connections between explainability and robustness, showing that many commonly-perceived explainability issues may be caused by non-robust model behavior. Accordingly, a careful study of adversarial examples and robustness can lead to models whose explanations better appeal to human intuition and domain knowledge.
0 Replies

Loading