Unveiling Concept Attribution in Diffusion Models

Published: 11 Jun 2025, Last Modified: 11 Jun 2025MUGen @ ICML 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: generative models, diffusion models, interpretability, concept erasure
TL;DR: We study how model components store knowledge in diffusion models.
Abstract: Diffusion models have shown remarkable abilities in generating realistic and high-quality images from text prompts. However, a trained model remains largely black-box; little do we know about the roles of its components in exhibiting a concept, such as objects or styles. In this work, we approach diffusion models' interpretability problem from a general perspective and pose a question: \textit{``How do model components work jointly to demonstrate knowledge?''}. To answer this question, we decompose diffusion models using component attribution, systematically unveiling the importance of each component (specifically the model parameter) in generating a concept. Extensive experimental results validate the significance of both positive and negative components pinpointed by our framework, demonstrating the potential of providing a complete view of interpreting generative models.
Submission Number: 42
Loading