Beyond Concept Bottleneck Models: How to Make Black Boxes Intervenable?

Ričards Marcinkevičs; Sonia Laguna; Moritz Vandenhirtz; Julia E Vogt

Beyond Concept Bottleneck Models: How to Make Black Boxes Intervenable?

Ričards Marcinkevičs, Sonia Laguna, Moritz Vandenhirtz, Julia E Vogt

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: societal considerations including fairness, safety, privacy

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: interpretability, explainability, concepts, concept bottleneck models, model interventions, healthcare

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: We introduce concept-based interventions for black-box models, formalise the model's intervenability as a measure of intervention effectiveness, and propose a fine-tuning procedure to improve intervenability.

Abstract: Recently, interpretable machine learning has re-explored concept bottleneck models (CBM), comprising step-by-step prediction of the high-level concepts from the raw features and the target variable from the predicted concepts. A compelling advantage of this model class is the user's ability to intervene on the predicted concept values, consequently affecting the model's downstream output. In this work, we introduce a method to perform such concept-based interventions on already-trained neural networks, which are not interpretable by design. Furthermore, we formalise the model's *intervenability* as a measure of the effectiveness of concept-based interventions and leverage this definition to fine-tune black-box models. Empirically, we explore the intervenability of black-box classifiers on synthetic tabular and natural image benchmarks. We demonstrate that fine-tuning improves intervention effectiveness and often yields better-calibrated predictions. To showcase the practical utility of the proposed techniques, we apply them to deep chest X-ray classifiers and show that fine-tuned black boxes can be as intervenable and more performant than CBMs.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

Supplementary Material: zip

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 5402

Loading