Keywords: explanation-by-design, class-defining-features
TL;DR: We propose a method that improves interpretability in NNs by decomposing images into regions based on learned concepts and tracing these regions to corresponding parts in training images.
Abstract: Aligning machine representations with human understanding is key to improving interpretability of machine learning (ML) models.
When classifying a new image, humans often explain their decisions by decomposing the image into concepts and pointing to corresponding regions in familiar images.
Current ML explanation techniques typically either trace decision-making processes to reference prototypes, generate attribution maps highlighting feature importance, or incorporate intermediate bottlenecks designed to align with human-interpretable concepts.
The proposed method, named COMiX, classifies an image by decomposing it into regions based on learned concepts and tracing each region to corresponding ones in images from the training dataset, assuring that explanations fully represent the actual decision-making process. We dissect the test image into selected internal representations of a neural network to derive prototypical parts (primitives) and match them with the corresponding primitives derived from the training data.
In a series of qualitative and quantitative experiments, we theoretically prove and demonstrate that our method, in contrast to \textit{post hoc} analysis, provides fidelity of explanations and shows that the efficiency is competitive with other inherently interpretable architectures. Notably, it shows substantial improvements in fidelity and sparsity metrics, including $48.82\%$ improvement in the C-insertion score on the ImageNet dataset over the best state-of-the-art baseline.
Primary Area: interpretability and explainable AI
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 8149
Loading