Learning by Self-Explaining

Wolfgang Stammer; Felix Friedrich; David Steinmann; Manuel Brack; Hikaru Shindo; Kristian Kersting

Learning by Self-Explaining

Wolfgang Stammer, Felix Friedrich, David Steinmann, Manuel Brack, Hikaru Shindo, Kristian Kersting

Published: 04 Sept 2024, Last Modified: 17 Sept 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Much of explainable AI research treats explanations as a means for model inspection. Yet, this neglects findings from human psychology that describe the benefit of self-explanations in an agent’s learning process. Motivated by this, we introduce a novel workflow in the context of image classification, termed Learning by Self-Explaining (LSX). LSX utilizes aspects of self-refining AI and human-guided explanatory machine learning. The underlying idea is that a learner model, in addition to optimizing for the original predictive task, is further optimized based on explanatory feedback from an internal critic model. Intuitively, a learner’s explanations are considered “useful” if the internal critic can perform the same task given these explanations. We provide an overview of important components of LSX and, based on this, perform extensive experimental evaluations via three different example instantiations. Our results indicate improvements via Learning by Self-Explaining on several levels: in terms of model generalization, reducing the influence of confounding factors, and providing more task-relevant and faithful model explanations. Overall, our work provides evidence for the potential of self-explaining within the learning phase of an AI model.

Submission Length: Long submission (more than 12 pages of main content)

Previous TMLR Submission Url: https://openreview.net/forum?id=OvJZ58BmOE

Changes Since Last Submission: Dear editors and reviewers of TMLR, In the following we wish to highlight the changes that we have made for our resubmission based on the feedback of the previous reviewing round (we abbreviate our approach, Learning by Self-Explaining, with LSX). In accordance with the previous AE's and reviewers recommendations we have immensely updated and restrucutured the paper in the following ways: - We have narrowed down the scope of the work to focus on image classification and refrained from the use of the term "paradigm'' in the context of LSX. - We have narrowed the motivation in the introduction to the core aspects, i.e. combine ideas of self-refining machine learning and explanatory interactive learning and have updated Fig. 1, accordingly. - As requested, we have moved all general discussions that are not directly related to our contributions and evaluations (e.g. discussions on system 1 and 2, human-machine interactions) into Suppl. D. This concentrates our work on our main contributions and claims. - As requested, we have moved the descriptions of the example instantiations to the experimental evaluations in Sec. 3.1. (rather than a large separate section as it was previously) and details to Suppl. A.1-3. - Moreover, for clarity we have provided a tabular overview of these instantiations in Tab. 1 based on our typology of Sec. 2. - We have added extensive additional experiments as the previous AE had suggested (in accordance with the mentioned paper [1]) to provide more in-depth analyses of our approach. In this context we have added several ablation evaluations (Sec. 3.4) as well as extensive discussions on these (Sec. 4) that are aimed at investigating the potential limits of our approach. - As suggested, we have added a novel, third instantiation of our approach (denoted as VLM-LSX) that goes beyond one modality and image classification. The results can be found in Tab. 6. These results further strengthen our claims. - We have added an extensive discussion on our results concerning confounding mitigation in Sec. 3.4 as previous reviews had requested. - We also provide an intuitive interpretation of what our approach is "doing'' (Sec. 4), as in [1]. - We have added overviews of all datasets used in our evaluations in Suppl. B as well as example explanation visualizations in Suppl. C.3 concerning unconfounding (as previous reviews had requested). - We have provided an anonymous link to our code repository as requested. - We have fixed notation errors in Sec. 3 as requested. We thankfully acknowledge that these changes have immensely improved the structure and scope of our work. Best, the Authors [1] Zhang, Hongyi, et al. "mixup: Beyond empirical risk minimization.", ICLR (2018)

Code: https://github.com/ml-research/learning-by-self-explaining

Assigned Action Editor: ~changjian_shui1

Submission Number: 2479

Loading