Enhancing Interpretability: A Versatile Clue-Based Framework for Faithful In-Depth Interpretations

22 Jan 2026 (modified: 11 May 2026)Decision pending for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Despite the state-of-the-art performance of deep neural networks, they are susceptible to bias and malfunction in unforeseen situations. Moreover, the complex computations underlying their reasoning are not human-understandable, hindering the development of trust and the validation of decisions. Local \xait{explanation} methods seek to provide explanations for individual model decisions with two key goals: faithfulness to the model and human-understandability. However, existing approaches often suffer from performance loss, limited applicability to pre-trained models, and unfaithful explanations. Seeking more faithful interpretations, we introduce a novel definition, called Distinguishing Clue, which is a set of input regions that uniquely promote specific network decisions, detected through our Local Attention Perception (LAP) module. Our innovative training scheme allows LAP to learn these clues without relying on expert annotations. It also provides a means for injecting both general and expert knowledge. The system is usable for training networks from scratch, enhancing their interpretability, and \xait{explaining} networks that have already been trained. We demonstrate the superiority of the proposed method by evaluating it on different architectures across two datasets, including ImageNet. The proposed framework offers interpretations that are more valid and more faithful to the model than those produced by commonly used explainer methods.
Submission Type: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Quanshi_Zhang1
Submission Number: 7109
Loading