A Probabilistic Hard Concept Bottleneck for Steerable Generative Models

A Probabilistic Hard Concept Bottleneck for Steerable Generative Models

ICLR 2026 Conference Submission19121 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: generative models, interpretability, steerability, concept bottleneck, hard concepts, probabilistic models

TL;DR: We introduce the Variational Hard Concept Bottleneck (VHCB) for CBGMs, improving steerability by mitigating concept leakage and enabling generation from specific concept configurations. We also propose a systematic evaluation framework for CBGMs.

Abstract: Concept Bottleneck Generative Models (CBGMs) incorporate a human-interpretable concept bottleneck layer, which makes them interpretable and steerable. However, designing such a layer for generative models poses the same challenges as for concept bottleneck models in a supervised context, if not greater ones. Deterministic mappings from the model inner representations to soft concepts in existing CBGMs: (i) limit steerable generation to modifying concepts in existing inputs; and, more importantly, (ii) are susceptible to *concept leakage*, which hinders their steerability. To address these limitations, we first introduce the Variational Hard Concept Bottleneck (VHCB) layer. The VHCB maps probabilistic estimates of binary latent variables to hard concepts, which have been shown to mitigate leakage. Remarkably, its probabilistic formulation enables direct generation from a specified set of concepts. Second, we propose a systematic evaluation framework for assessing the steerability of CBGMs across various tasks (e.g., activating and deactivating concepts). Our framework which allows us to empirically demonstrate that the VHCB layer consistently improves steerability.

Primary Area: interpretability and explainable AI

Submission Number: 19121

Loading