Concept Bottleneck Model with Zero Performance Loss

Zhenzhen Wang; Aleksander Popel; Jeremias Sulam

Concept Bottleneck Model with Zero Performance Loss

Zhenzhen Wang, Aleksander Popel, Jeremias Sulam

Published: 11 Feb 2025, Last Modified: 06 Mar 2025CPAL 2025 (Proceedings Track) PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: interpretability, explainability, concept bottleneck model, concept explanations

Abstract: Interpreting machine learning models with high-level, human-understandable concepts has gained increasing importance. The concept bottleneck model (CBM) is a popular approach for providing such explanations but typically sacrifices some prediction power compared with standard black-box models. In this work, we propose an approach to turn an off-the-shelf black-box model into a CBM without changing its predictions or compromising prediction power. Through an invertible mapping from the model's latent space to a concept space, predictions are decomposed into a linear combination of concepts. This provides concept-based explanations for the complex model and allows us to intervene in its predictions manually. Experiments across benchmarks demonstrate that CBM-zero provides comparable explainability and better accuracy than other CBM methods.

Submission Number: 71

Loading