Post-hoc Concept Bottleneck Models

Mert Yuksekgonul; Maggie Wang; James Zou

Post-hoc Concept Bottleneck Models

Mert Yuksekgonul, Maggie Wang, James Zou

Published: 25 Mar 2022, Last Modified: 26 May 2025ICLR 2022 PAIR^2Struct PosterReaders: Everyone

Keywords: concepts, interpretability, concept bottleneck models, model editing

TL;DR: We present a method to turn any neural network into a concept bottleneck model without sacrificing model performance, retaining interpretability benefits along with easy model editing.

Abstract: Concept Bottleneck Models (CBMs) map the inputs onto a concept bottleneck and use the bottleneck to make a prediction. A concept bottleneck enhances interpretability since it can be investigated to understand what the model sees in an input, and which of these concepts are deemed important. However, CBMs are restrictive in practice as they require concept labels during training to learn the bottleneck. Additionally, it is questionable if CBMs can match the accuracy of an unrestricted neural network trained on a given domain, potentially reducing the incentive to deploy them in practice. In this work, we address these two key limitations by introducing Post-hoc Concept Bottleneck models (P-CBMs). We show that we can turn any neural network into a P-CBM, without sacrificing model performance and retaining interpretability benefits. Finally, we show that P-CBMs can provide significant performance gains with model editing without any fine-tuning and needing data from the target domain.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/post-hoc-concept-bottleneck-models/code)

0 Replies

Loading