Keywords: Diffusion Model; Data Attribution; Training Data Influence
TL;DR: We propose Diffusion Attribution Score in diffusion model which can be used to evaluate the influence of training sample in the generation. Our proposed method reached the State Of The Art performance compared with baseline method.
Abstract: As diffusion models advance, the scientific community is actively developing methods to curb the misuse of generative models, which aims to prevent the reproduction of copyrighted, explicitly violent, or personally sensitive information in generated images.
One strategy is to identify the contribution of training samples in generative models by evaluating their influence to the generated images, a task known as data attribution.
Existing data attribution approaches on diffusion models suggest representing the contribution of a specific training sample by evaluating the change in the diffusion loss when the sample is included versus excluded from the training process.
However, we argue that the direct usage of diffusion loss cannot represent such a contribution accurately due to the diffusion loss calculation. Specifically, these approaches measure the divergence between predicted and ground truth distributions, which leads to an indirect comparison between the predicted distributions and cannot represent the variances between model behaviors.
To address these issues, we aim to measure the direct comparison between predicted distributions with an attribution score to analyse the training sample importance, which is achieved by Diffusion Attribution Score (DAS).
Underpinned by rigorous theoretical analysis, we elucidate the effectiveness of DAS.
Additionally, we explore strategies to accelerate DAS calculations, facilitating its application to large-scale diffusion models.
Our extensive experiments across various datasets and diffusion models demonstrate that DAS significantly surpasses previous benchmarks in terms of the linear data-modelling score, establishing new state-of-the-art performance.
Code is available at https://anonymous.4open.science/r/Diffusion-Attribution-Score-411F.
Primary Area: generative models
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 5713
Loading