Adaptive Test-Time Intervention for Concept Bottleneck Models

Matthew Shen; Aliyah R. Hsu; Abhineet Agarwal; Bin Yu

Adaptive Test-Time Intervention for Concept Bottleneck Models

Matthew Shen, Aliyah R. Hsu, Abhineet Agarwal, Bin Yu

Published: 05 Mar 2025, Last Modified: 14 Apr 2025BuildingTrustEveryoneRevisionsBibTeXCC BY 4.0

Track: Tiny Paper Track (between 2 and 4 pages)

Keywords: interpretable machine learning, distillation, test-time intervention

TL;DR: Distillation of concept-to-target portions of Concept Bottleneck models with interpretable tree-based models leads to adaptive test-time intervention.

Abstract: Concept bottleneck models (CBM) aim to improve model interpretability by predicting human level "concepts" in a bottleneck within a deep learning model architecture. However, how the predicted concepts are used in predicting the target still either remains black-box or is simplified to maintain interpretability at the cost of prediction performance. We propose to use Fast Interpretable Greedy Sum-Trees (FIGS) to obtain Binary Distillation (BD). This new method, called FIGS-BD, distills a binary-augmented concept-to-target portion of the CBM into an interpretable tree-based model, while maintaining the competitive prediction performance of the CBM teacher. FIGS-BD can be used in downstream tasks to explain and decompose CBM predictions into interpretable binary-concept-interaction attributions and guide adaptive test-time intervention. Across $4$ datasets, we demonstrate that our adaptive test-time intervention identifies key concepts that significantly improve performance for realistic human-in-the-loop settings that only allow for limited concept interventions.

Submission Number: 117

Loading