Debugging Concept Bottlenecks through Intervention: Shortcut Removal + Retraining

Published: 06 Mar 2025, Last Modified: 06 Mar 2025SCSL @ ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Track: regular paper (up to 6 pages)
Keywords: Keywords: concept bottleneck, prototypes, interpretability, shortcut learning, robustness
TL;DR: Human debugging of interpretable concept bottlenecks
Abstract: Machine learning models often learn unintended shortcuts (spurious correlations) that do not reflect the true causal structure of a task and thus degrade dramatically under subpopulation shift. This problem becomes especially severe in high-stakes domains where the cost of relying on misaligned shortcuts is prohibitive. To address this challenge, concept bottlenecks explicitly factor predictions into high-level concepts and a simple decision layer, enabling experts to diagnose whether learned concepts align with their domain knowledge. Yet, simply removing undesirable concepts after training is insufficient to prevent shortcuts when the concept encoder is incomplete or entangled. In this work, we propose *CBDebug*, a novel framework to debug concept bottlenecks for robustness under subpopulation shift. First, a domain expert identifies and removes spurious concepts using model explanations (the *Removal* step). Then, leveraging this human feedback, we disentangle or replace the removed shortcuts by retraining on a rebalanced dataset based on the causal graph (the *Retraining* step). Empirically, *CBDebug* significantly outperforms existing concept-based methods. Overall, our work demonstrates how expert-guided debugging of concept bottlenecks can achieve interpretability and robustness, promoting alignment of a model’s internal reasoning with how humans reason.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Format: Maybe: the presenting author will attend in person, contingent on other factors that still need to be determined (e.g., visa, funding).
Funding: Yes, the presenting author of this submission falls under ICLR’s funding aims, and funding would significantly impact their ability to attend the workshop in person.
Presenter: ~Eric_Enouen1
Submission Number: 6
Loading