C2MF: Consistent and Concept-Unified Matrix Factorization for Interpretable and Robust Concept Discovery
Keywords: Interpretability, matrix factorization, concept discovery, concept-based explanation
Abstract: Deep neural networks have achieved remarkable performance in various domains, but their opacity remains a significant challenge, particularly in high-risk applications. Traditional attribution methods highlight important input regions but fail to reveal the underlying semantic concepts driving model decisions. Recent methods like TCAV and CRAFT attempt to address this gap by extracting interpretable concepts, but they suffer from limitations such as distribution mismatch between training and inference, reliance on non-negative activation constraints, and the lack of a shared concept dictionary across categories. In this paper, we introduce Consistent and Concept-Unified Matrix Factorization (C2MF) method, a novel approach that overcomes these issues. By leveraging full-image representations instead of cropped sub-regions, C2MF ensures consistency between training and inference distributions, improving robustness and confidence calibration. We also relax the non-negativity constraint, allowing both positive and negative concept activations, which enhances the flexibility and fidelity of learned concepts. Furthermore, we propose a shared global concept dictionary across all categories, enabling concept reuse and improving interpretability. Through extensive experiments on ImageNet and CUB datasets, we demonstrate that C2MF outperforms state-of-the-art methods in terms of concept faithfulness, category reconstruction accuracy, and generalization across categories. Our code is available at: https://anonymous.4open.science/r/C2MF-E760/
Primary Area: interpretability and explainable AI
Submission Number: 2660
Loading