Counterfactual Concept Bottleneck Models

Gabriele Dominici; Pietro Barbiero; Francesco Giannini; Martin Gjoreski; Giuseppe Marra; Marc Langheinrich

Counterfactual Concept Bottleneck Models

Gabriele Dominici, Pietro Barbiero, Francesco Giannini, Martin Gjoreski, Giuseppe Marra, Marc Langheinrich

Published: 22 Jan 2025, Last Modified: 20 Mar 2025ICLR 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Concept Bottleneck Models, Concept Based Model, Counterfactuals, Explainable AI, Interpretability

TL;DR: CounterFactual Concept Bottleneck Models predict, simulate, and generate alternative scenarios in a single step, without needing post-hoc analysis.

Abstract: Current deep learning models are not designed to simultaneously address three fundamental questions: predict class labels to solve a given classification task (the "What?"), simulate changes in the situation to evaluate how this impacts class predictions (the "How?"), and imagine how the scenario should change to result in different class predictions (the "Why not?"). While current approaches in causal representation learning and concept interpretability are designed to address some of these questions individually (such as Concept Bottleneck Models, which address both ``what'' and ``how'' questions), no current deep learning model is specifically built to answer all of them at the same time. To bridge this gap, we introduce CounterFactual Concept Bottleneck Models (CF-CBMs), a class of models designed to efficiently address the above queries all at once without the need to run post-hoc searches. Our experimental results demonstrate that CF-CBMs: achieve classification accuracy comparable to black-box models and existing CBMs (“What?”), rely on fewer important concepts leading to simpler explanations (“How?”), and produce interpretable, concept-based counterfactuals (“Why not?”). Additionally, we show that training the counterfactual generator jointly with the CBM leads to two key improvements: (i) it alters the model's decision-making process, making the model rely on fewer important concepts (leading to simpler explanations), and (ii) it significantly increases the causal effect of concept interventions on class predictions, making the model more responsive to these changes.

Supplementary Material: zip

Primary Area: interpretability and explainable AI

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 7279

Loading