Unveiling Concept Attribution in Diffusion Models

Quang H Nguyen; Hoang Phan; Khoa D Doan

Unveiling Concept Attribution in Diffusion Models

Quang H Nguyen, Hoang Phan, Khoa D Doan

Published: 11 Jun 2025, Last Modified: 01 Jul 2025MUGen @ ICML 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: generative models, diffusion models, interpretability, concept erasure

TL;DR: We study how model components store knowledge in diffusion models.

Abstract: Diffusion models have shown remarkable abilities in generating realistic and high-quality images from text prompts. However, a trained model remains largely black-box; little do we know about the roles of its components in exhibiting a concept, such as objects or styles. In this work, we approach diffusion models' interpretability problem from a general perspective and pose a question: \textit{``How do model components work jointly to demonstrate knowledge?''}. To answer this question, we decompose diffusion models using component attribution, systematically unveiling the importance of each component (specifically the model parameter) in generating a concept. Extensive experimental results validate the significance of both positive and negative components pinpointed by our framework, demonstrating the potential of providing a complete view of interpreting generative models.

Submission Number: 42

Loading