Primary Area: generative models
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: diffusion models, data attribution, data valuation, generative models
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We propose a framework for attributing synthetically generated images back to training data, and provide a method for computing such attributions efficiently.
Abstract: Diffusion models trained on large datasets can synthesize photo-realistic images
of remarkable quality and diversity. However, attributing these images
back to the training data—that is, identifying specific training examples which
caused an image to be generated—remains a challenge. In this paper, we
propose a framework that: (i) provides a formal notion of data attribution in
the context of diffusion models, and (ii) allows us to counterfactually
validate such attributions. Then, we provide a method for computing these
attributions efficiently. Finally, we apply our method to find (and evaluate)
such attributions for denoising diffusion probabilistic models trained on
CIFAR-10 and latent diffusion models trained on MS COCO.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6016
Loading