Zero-shot Concept Bottleneck Models

18 Sept 2025 (modified: 12 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: concept bottleneck models, vision-language models
TL;DR: We introduce an interpretable and intervenable model family called zero-shot concept bottleneck models, which can provide concept-based explanations for its prediction in fully zero-shot manner.
Abstract: Concept bottleneck models (CBMs) are inherently interpretable and intervenable neural network models, which explain their final class label prediction via intermediate predictions of high-level semantic concepts. However, they require target task training to learn input-to-concept and concept-to-class mappings, which necessitates collecting target datasets and significant training resources. In this paper, we present zero-shot concept bottleneck models (Z-CBMs), which predict concepts and labels in a fully zero-shot manner without additional training of neural networks. Z-CBMs leverage a large-scale concept bank, comprising millions of vocabulary extracted from the web, to describe diverse inputs across various domains. For the input-to-concept mapping, we introduce concept retrieval, which dynamically identifies input-related concepts through cross-modal search within the concept bank. In the concept-to-class inference, we apply concept regression to select essential concepts from the retrieved concepts by sparse linear regression. Through extensive experiments, we demonstrate that our Z-CBMs provide interpretable and intervenable concepts without any additional training.
Supplementary Material: zip
Primary Area: interpretability and explainable AI
Submission Number: 10596
Loading