Debugging Concept Bottlenecks through Intervention: Shortcut Removal and Retraining

Eric Enouen; sainyam galhotra

Debugging Concept Bottlenecks through Intervention: Shortcut Removal and Retraining

Eric Enouen, sainyam galhotra

Published: 06 Mar 2025, Last Modified: 01 May 2025SCSL @ ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Track: regular paper (up to 6 pages)

Keywords: Keywords: concept bottleneck, prototypes, interpretability, shortcut learning, robustness

TL;DR: Human debugging of interpretable concept bottlenecks

Abstract: Machine learning models often learn unintended shortcuts (spurious correlations) that do not reflect the true causal structure of a task and thus degrade dramatically under subpopulation shift. This problem becomes especially severe in high-stakes domains where the cost of relying on misaligned shortcuts is prohibitive. To address this challenge, concept bottlenecks explicitly factor predictions into high-level concepts and a simple decision layer, enabling experts to diagnose whether learned concepts align with their domain knowledge. Yet, simply removing undesirable concepts after training is insufficient to prevent shortcuts when the concept encoder is incomplete or entangled. In this work, we propose *CBDebug*, a novel framework to debug concept bottlenecks for robustness under subpopulation shift. First, a domain expert identifies and removes spurious concepts using model explanations (the *Removal* step). Then, leveraging this human feedback, we disentangle or replace the removed shortcuts by retraining on a rebalanced dataset based on the causal graph (the *Retraining* step). Empirically, *CBDebug* significantly outperforms existing concept-based methods. Overall, our work demonstrates how expert-guided debugging of concept bottlenecks can achieve interpretability and robustness, promoting alignment of a model’s internal reasoning with how humans reason.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Submission Number: 6

Loading