Keywords: Concept bottleneck models, Information bottleneck, Variational Inference
TL;DR: Enhances Concept Bottleneck Models by integrating the Information Bottleneck principle to reduce concept leakage and improve performance
Abstract: Concept Bottleneck Models (CBMs) promise interpretable prediction by forcing all information to flow through a human-understandable "concept" layer, but this interpretability often comes at the cost of reduced accuracy and concept leakage. To solve this, we introduce an explicit Information Bottleneck regularizer on the concept layer---penalizing $I(X;C)$---to encourage minimal yet task-relevant concept representations. We derive two variants of this penalty and integrate them into the standard CBM training objective. Across six model families (hard/soft CBMs trained jointly or independently, ProbCBM, AR-CBM, and CEM) and three benchmark datasets (CUB, AwA2, aPY), IB-regularized models consistently outperform their vanilla counterparts---narrowing and in some cases closing the accuracy gap to unconstrained black-box networks. We further quantify concept leakage with two metrics (Oracle Impurity and Niche Impurity Scores) and show that IB constraints reduce leakage significantly, yielding more disentangled concepts. To assess how well concept sets support test-time corrections, we employed two intervention metrics (area under the intervention-accuracy curve and average marginal gain per intervened concept) demonstrating that IB-regularized CBMs retain higher intervention gains even when large fractions of concepts are corrupted. Our results reveal that enforcing a minimal-sufficient concept bottleneck improves both predictive performance and the reliability of concept-level interventions, thereby closing the accuracy gap of CBMs while improving their interpretability and ability to intervene.
Primary Area: Social and economic aspects of machine learning (e.g., fairness, interpretability, human-AI interaction, privacy, safety, strategic behavior)
Submission Number: 8149
Loading