Does Generation Require Memorization? Creative Diffusion Models using Ambient Diffusion

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We study to what extent is possible to train powerful generative models without memorizing the training set.
Abstract: There is strong empirical evidence that the stateof-the-art diffusion modeling paradigm leads to models that memorize the training set, especially when the training set is small. Prior methods to mitigate the memorization problem often lead to decrease in image quality. Is it possible to obtain strong and creative generative models, i.e., models that achieve high generation quality and low memorization? Despite the current pessimistic landscape of results, we make significant progress in pushing the trade-off between fidelity and memorization. We first provide theoretical evidence that memorization in diffusion models is only necessary for denoising problems at low noise scales (usually used in generating high-frequency details). Using this theoretical insight, we propose a simple, principled method to train the diffusion models using noisy data at large noise scales. We show that our method significantly reduces memorization without decreasing the image quality, for both text-conditional and unconditional models and for a variety of data availability settings.
Lay Summary: Diffusion models have emerged as a prominent method for generating novel images. However, these models can sometimes memorize their training data, leading to outputs that are mere variations of training images rather than genuinely generating new images by learning the underlying structure of natural images. This raises crucial privacy and ethical questions. Prior attempts to mitigate this memorization often resulted in a decline in the quality of the generated images. In this work, we introduce a simple approach based on learning diffusion models with noisy data that not only reduces model memorization but also concurrently improves the quality of the images generated.
Link To Code: https://github.com/kulinshah98/memorization_noisy_data
Primary Area: Deep Learning->Generative Models and Autoencoders
Keywords: diffusion, memorization, corrupted data, limited samples, generative models, ambient diffusion
Submission Number: 13735
Loading