Keywords: Class-wise explanation, Backdoor attack detection, Global explanation
Abstract: Many explainable AI (XAI) methods have been proposed to interpret neural net- work’s decisions on why they predict what they predict locally through gradient information. Yet, existing works mainly for local explanation lack global knowledge to show class-wise explanations in the whole training procedure. To fill this gap, we proposed to visualize global explanation in the input space for every class learned in the training procedure. Specifically, our solution finds a representation set that could demonstrate the learned knowledge for each class. To achieve this goal, we optimize the representation set by imitating the model training procedure over the full dataset. Experimental results show that our method could generate class-wise explanations with high quality in a series of image classification datasets. Using our global explanation, we further analyze the model knowledge in different training procedures, including adversarial training and noisy label learning. Moreover, we illustrate that the generated explanations could lend insights into diagnosing model failures, such as revealing triggers in a backdoored model.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Social Aspects of Machine Learning (eg, AI safety, fairness, privacy, interpretability, human-AI interaction, ethics)
TL;DR: We propose a method to visualize global explanation in the input space for every class learned in the training procedure.
5 Replies
Loading