Spatial Compositional Counterfactuals in Concept Bottleneck Models

Published: 27 May 2026, Last Modified: 15 Jun 2026CompLearn 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Concept Bottleneck Models, Counterfactual Interpretability
Abstract: Concept Bottleneck Models (CBMs) decompose images into semantic concepts to make predictions interpretable, but in doing so, they collapse the images' spatial structure. Counterfactual CBMs identify which concepts must change to alter the model's prediction, but they do not explain their spatial attribution. We introduce the Would-Have-Expected Concept Bottleneck Model (WHE-CBM), a spatial CBM that represents concepts and learns a counterfactual editor over those maps. Given a target object and desired class, the editor predicts a sparse continuous concept-logit delta whose sign, magnitude, and spatial support specify which semantic components must increase or decrease, and where, to flip the prediction. Across concept-controlled, label-free, and object-centric benchmarks, WHE-CBM improves counterfactual validity and sparsity over CF-CBM, localizes edit mass to the target object, and preserves non-target RoIs in multi-object scenes.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 214
Loading