Keywords: Foundational work, Automated interpretability, Interpretability tooling and software
TL;DR: We proposed a dual-modality Concept Bottleneck Model (MM-CBM) that achieves interpretable, faithful predictions while matching black-box performance and enabling zero-shot or unsupervised classification.
Abstract: Concept Bottleneck Models (CBMs) enhance the interpretability of deep learning networks by aligning the features extracted from images with natural concepts. However, existing CBMs are constrained in their ability to generalize beyond a fixed set of predefined classes and the risk of non-concept information leakage, where predictive signals outside the intended concepts are inadvertently exploited. In this paper, we propose Multimodal Concept Bottleneck Model (MM-CBM) to address these issues and extend CBMs into CLIP. MM-CBM utilizes dual Concept Bottleneck Layers (CBLs) to align both the image and text embeddings into interpretable features. This allows us to perform new vision tasks like classification with unseen classes or image retrieval in an interpretable way. Compared to existing methods, MM-CBM achieves up to 43.96\% accuracy improvement on average across four standard benchmarks. Our method maintains high accuracy, staying within ~5\% of black-box model performance while offering greater interpretability.
Submission Number: 20
Loading