Keywords: data attribution, diffusion models, influence estimation, memorization, privacy, data valuation
TL;DR: We formalize the problem of data attribution for diffusion models and propose an efficient attribution method.
Abstract: Diffusion-based generative models can synthesize photo-realistic images of
remarkable quality and diversity. However, *attributing* these images back
to the training data---that is, identifying specific training examples which
caused an image to be generated---remains a challenge. In this paper, we propose
a framework that: (i) frames data attribution in the context of diffusion
models, (ii) provides a method for computing such attributions efficiently, and
(iii) allows us to *counterfactually* validate them. We then apply our
framework to CIFAR-10 and MS COCO datasets.
Submission Number: 51
Loading