Do Concept Bottleneck Models Obey Locality?

Published: 27 Oct 2023, Last Modified: 21 Nov 2023NeurIPS XAIA 2023EveryoneRevisionsBibTeX
TL;DR: We observe if concept bottleneck models certain invariances, such as whether concepts are only impacted by changes to the corresponding region in the input.
Abstract: Concept-based learning improves a deep learning model's interpretability by explaining its predictions via human-understandable concepts. Deep learning models trained under this paradigm heavily rely on the assumption that neural networks can learn to predict the presence or absence of a given concept independently of other concepts. Recent work, however, strongly suggests that this assumption may fail to hold in Concept Bottleneck Models (CBMs), a quintessential family of concept-based interpretable architectures. In this paper, we investigate whether CBMs correctly capture the degree of conditional independence across concepts when such concepts are localised both \textit{spatially}, by having their values entirely defined by a fixed subset of features, and \textit{semantically}, by having their values correlated with only a fixed subset of predefined concepts. To understand locality, we analyse how changes to features outside of a concept's spatial or semantic locality impact concept predictions. Our results suggest that even in well-defined scenarios where the presence of a concept is localised to a fixed feature subspace, or whose semantics are correlated to a small subset of other concepts, CBMs fail to learn this locality. These results cast doubt upon the quality of concept representations learnt by CBMs and strongly suggest that concept-based explanations may be fragile to changes outside their localities.
Submission Track: Full Paper Track
Application Domain: None of the above / Not applicable
Clarify Domain: General method applicable to Concept Bottleneck Models
Survey Question 1: Explainable models fail to follow certain sanity checks, making it difficult to trust the explanations arising from these models. For example, these models will use information from irrelevant features to make predictions for a particular concept, and the presence of concepts without a semantic relationship can result in wildly different concept predictions.
Survey Question 2: We wanted to see whether current explainable models can truly be trusted; models which do not elicit such trust cannot be deployed into the real world, as it's hard to know whether explanations from such models are truly a reflection of how predictions were made.
Survey Question 3: Used concept bottleneck models.
Submission Number: 63