Concept Bottleneck Generative Models

Published: 23 Jun 2023, Last Modified: 20 Jul 2023DeployableGenerativeAIEveryoneRevisions
Keywords: Interpretability, generative models
TL;DR: We extend Concept bottleneck models to generative models.
Abstract: Despite their increasing prevalence, generative models remain opaque and difficult to steer reliably. To address these challenges, we present concept bottleneck (CB) generative models, a type of generative model where one of its internal layers—a concept bottleneck (CB) layer—is constrained to encode human-understandable features. While concept-botttleneck layers have been used to improved interpretability for supervised learning tasks, here we extend them generative models. The concept bottleneck layer partitions the generative model into three parts: the pre-concept bottleneck portion, the CB layer, and the post-concept bottleneck portion. To train CB generative models, we complement the traditional task-based loss function for training generative models with three additional loss terms: a concept loss, an orthogonality loss, and a concept sensitivity loss. The CB layer and these corresponding loss terms are model agnostic, which we demonstrate by applying them to three different families of generative models: generative adversarial networks, variational autoencoders, and diffusion models. On real-world datasets, across three types of generative models, steering a generative model with the CB layer outperforms several baselines.
Submission Number: 13
Loading