Addressing Leakage in Concept Bottleneck ModelsDownload PDF

Published: 31 Oct 2022, 18:00, Last Modified: 11 Oct 2022, 18:20NeurIPS 2022 AcceptReaders: Everyone
Keywords: interpretable models, concept bottleneck model, leakage
TL;DR: Leakage adversarily affects the performance and interpretability of concept bottleneck models. We address the underlying causes.
Abstract: Concept bottleneck models (CBMs) enhance the interpretability of their predictions by first predicting high-level concepts given features, and subsequently predicting outcomes on the basis of these concepts. Recently, it was demonstrated that training the label predictor directly on the probabilities produced by the concept predictor as opposed to the ground-truth concepts, improves label predictions. However, this results in corruptions in the concept predictions that impact the concept accuracy as well as our ability to intervene on the concepts -- a key proposed benefit of CBMs. In this work, we investigate and address two issues with CBMs that cause this disparity in performance: having an insufficient concept set and using inexpressive concept predictor. With our modifications, CBMs become competitive in terms of predictive performance, with models that otherwise leak additional information in the concept probabilities, while having dramatically increased concept accuracy and intervention accuracy.
Supplementary Material: pdf
13 Replies