Abstract: The trade-off between accuracy and interpretability has long been a challenge in machine learning (ML). This tension is particularly significant for emerging *interpretable-by-design* methods, which aim to redesign ML algorithms for trustworthy interpretability but often sacrifice accuracy in the process. In this paper, we address this gap by investigating the impact of deviations in concept representations—an essential component of interpretable models—on prediction performance and propose a novel framework to mitigate these effects. The framework builds on the principle of optimizing concept embeddings under constraints that preserve interpretability. Using a generative model as a test-bed, we rigorously prove that our algorithm achieves zero loss while progressively enhancing the interpretability of the resulting model. Additionally, we evaluate the practical performance of our proposed framework in generating explainable predictions for image classification tasks across various benchmarks. Compared to existing explainable methods, our approach not only improves prediction accuracy while preserving model interpretability across various large-scale benchmarks but also achieves this with significantly lower computational cost.
Lay Summary: In recent years, artificial intelligence (AI) systems—especially deep learning models—have achieved remarkable success in tasks like image recognition. However, these models often operate as "black boxes," making it difficult for users to understand why a model makes a certain decision. This lack of transparency raises concerns in high-stakes applications such as healthcare, finance, and criminal justice.
Our paper introduces a new method called Constrained Concept Refinement (CCR) to help make AI decisions more interpretable and trustworthy. The key idea is to guide the model’s internal reasoning using human-understandable concepts (such as “has wings” or “is furry”) while enforcing constraints that make these explanations consistent, sparse, and grounded in real data. Unlike many previous methods, CCR is both easy to implement and computationally efficient. It also allows users to fine-tune explanations by adjusting a few intuitive parameters, without requiring deep changes to the underlying model.
Through experiments on image classification tasks, we show that CCR produces clearer and more faithful concept-based explanations compared to existing approaches. Our method provides a practical step toward building AI systems that are not only powerful but also more understandable and trustworthy to human users.
Link To Code: https://github.com/lianggeyuleo/CCR.git
Primary Area: Optimization
Keywords: Explainable AI, Interpretable ML, Optimization, Dictionary Learning, Computer Vision
Submission Number: 11262
Loading