Keywords: Archetype Decomposition, Model Transparency
Abstract: Traditional concept decomposition methods have made significant progress in improving the interpretability of deep learning models, but they still face many challenges. A key issue is that they often lack traceable explanations for concepts, making it difficult to understand and verify how models make decisions and provide explanations based on specific concepts. To overcome this limitation, this paper proposes a new method—Conceptual Archetype Decomposition (CAD)—which aims to provide more interpretable concept learning and decision-making process. Unlike existing methods, our approach ensures that each concept can be represented as a linear combination of training samples, with its total activation value equal to 1. This constraint limits the learning space of the concepts and enhances their interpretability. Therefore, the advantage of our method lies in its fine-grained concept activation decomposition, which directly constructs the explanatory space between training samples and concepts. Through a dual-index decision mechanism, we deeply analyze the relationship between test samples and training samples. Extensive experiments on the CUB and ImageNet datasets demonstrate that our model not only improves decision transparency but also exhibits stronger generalization ability in multi-class classification tasks. Our code is available at: https://anonymous.4open.science/r/CAD-4510/
Primary Area: interpretability and explainable AI
Submission Number: 6082
Loading