The Journey, Not the Destination: How Data Guides Diffusion Models

Published: 23 Jun 2023, Last Modified: 11 Jul 2023DeployableGenerativeAIEveryoneRevisions
Keywords: data attribution, diffusion models, influence estimation, memorization, privacy, data valuation
TL;DR: We formalize the problem of data attribution for diffusion models and propose an efficient attribution method.
Abstract: Diffusion-based generative models can synthesize photo-realistic images of remarkable quality and diversity. However, *attributing* these images back to the training data---that is, identifying specific training examples which caused an image to be generated---remains a challenge. In this paper, we propose a framework that: (i) frames data attribution in the context of diffusion models, (ii) provides a method for computing such attributions efficiently, and (iii) allows us to *counterfactually* validate them. We then apply our framework to CIFAR-10 and MS COCO datasets.
Submission Number: 51
Loading