Beyond Concept Bottleneck Models: How to Make Black Boxes Intervenable?

Ričards Marcinkevičs; Sonia Laguna; Moritz Vandenhirtz; Julia Vogt

Beyond Concept Bottleneck Models: How to Make Black Boxes Intervenable?

Ričards Marcinkevičs, Sonia Laguna, Moritz Vandenhirtz, Julia Vogt

Published: 27 Oct 2023, Last Modified: 07 Nov 2023NeurIPS XAIA 2023EveryoneRevisionsBibTeX

TL;DR: We introduce concept-based interventions for black-box models, formalise the model's intervenability as a measure of intervention effectiveness, and propose a fine-tuning procedure to improve intervenability.

Abstract: Recently, interpretable machine learning has re-explored concept bottleneck models (CBM), comprising step-by-step prediction of the high-level concepts from the raw features and the target variable from the predicted concepts. A compelling advantage of this model class is the user's ability to intervene on the predicted concept values, consequently affecting the model's downstream output. In this work, we introduce a method to perform such concept-based interventions on already-trained neural networks, which are not interpretable by design. Furthermore, we formalise the model's *intervenability* as a measure of the effectiveness of concept-based interventions and leverage this definition to fine-tune black-box models. Empirically, we explore the intervenability of black-box classifiers on synthetic tabular and natural image benchmarks. We demonstrate that fine-tuning improves intervention effectiveness and often yields better-calibrated predictions. To showcase the practical utility of the proposed techniques, we apply them to chest X-ray classifiers and show that fine-tuned black boxes can be as intervenable and more performant than CBMs.

Submission Track: Full Paper Track

Application Domain: Healthcare

Survey Question 1: This work proposes a way to interact and improve interaction with black-box neural networks, which are known to be opaque and hard to understand. Using the proposed techniques, a human user may specify high-level understandable attributes, which will be used to modify the network's prediction. For example, think of a physician interacting with a neural network that predicts a patient's diagnosis based on a chest X-ray: the doctor will be able to specify findings that they made and, thus, steer the prediction.

Survey Question 2: This work originated from collaborations with physicians and the realisation that predictive models, when incorporated into the clinical workflow, should ideally be interactive, and a human expert should be able to easily influence their output. Current methods in explainable and interpretable ML focus on passive model introspection and are rarely actionable, being sensitive to design choices and hard to validate. Finally, our work helps formalise the notion of intervenability, which has been implicitly the focus of many previous works.

Survey Question 3: We utilise concept bottleneck models (CBM) and concept-based explanations and propose novel techniques.

Submission Number: 21

Loading