Concept bottleneck models (CBMs) are inherently interpretable neural network models, which explain their final label prediction by high-level semantic \textit{concepts} predicted in the intermediate layers. Previous works of CBMs have succeeded in achieving high-accuracy concept/label predictions without manually collected concept labels by incorporating large language models (LLMs) and vision-language models (VLMs). However, they still require training on the target dataset to learn input-to-concept and concept-to-label correspondences, incurring target dataset collections and training resource requirements. In this paper, we present \textit{zero-shot concept bottleneck models} (Z-CBMs), which are interpretable models predicting labels and concepts in a fully zero-shot manner without training neural networks. Z-CBMs utilize a large-scale concept bank, which is composed of millions of noun phrases extracted from caption datasets, to describe arbitrary input in various domains. To infer the input-to-concept correspondence, we introduce \textit{concept retrieval}, which dynamically searches input-related concepts from the concept bank on the multi-modal feature space of pre-trained VLMs. This enables Z-CBMs to handle the millions of concepts and extract appropriate concepts for each input image. In the concept-to-label inference stage, we apply \textit{concept regression} to select important concepts from the retrieved concept candidates containing noisy concepts related to each other. To this end, concept regression estimates the importance weight of concepts with sparse linear regression approximating the input image feature vectors by the weighted sum of concept feature vectors. Through extensive experiments, we confirm that our Z-CBMs achieve both high target task performance and interpretability without any additional training.
Keywords: concept bottleneck models, interpretability, retrieving, sparse linear regression, vision-language models
TL;DR: We propose an interpretability model family called zero-shot concept bottleneck models, which can provide concept-based explanations for its prediction in fully zero-shot manner.
Abstract:
Supplementary Material: zip
Primary Area: interpretability and explainable AI
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4238
Loading