Keywords: Noise Prompt, Noise Prompt Learning, Image Synthesis, Diffusion Models
TL;DR: We introduce a new term called "noise prompt", and firstly formulate the "noise prompt learning" framework.
Abstract: Text-to-image diffusion model is a popular paradigm that synthesizes personalized images by providing a text prompt and a random Gaussian noise. While people observe that some noises are winning tickets that can achieve better text-image alignment and higher human preference than others, we still lack a machine learning framework to obtain those winning noises. To learn winning noises for diffusion sampling, we mainly make three contributions in this paper. First, we identify a new concept termed the $\textit{noise prompt}$, which aims at turning a random Gaussian noise into a winning noise ticket by adding a small desirable perturbation derived from the text prompt. Following the concept, we first formulate the $\textit{noise prompt learning}$ framework that systematically learns "prompted'' winning noise tickets associated with a text prompt for diffusion models. Second, we design a noise prompt data collection pipeline and collect a large-scale $\textit{noise prompt dataset}$ (NPD) that contains 100k pairs of random noises and winning noises with the associated text prompts. With the prepared NPD as the training dataset, we trained a small $\textit{noise prompt network}$ (NPNet) that can directly learn to transform a random noise ticket into a winning noise ticket. The learned winning noise perturbation can be considered as a kind of prompt for noise, as it is rich in semantic information and tailored to the given text prompt. Third, our extensive experiments demonstrate the impressive effectiveness and generalization of NPNet on improving the quality of synthesized images across various diffusion models, including SDXL, DreamShaper-xl-v2-turbo, and Hunyuan-DiT. Moreover, NPNet is a small and efficient controller that acts as a plug-and-play module with very limited additional inference and computational costs, as it just provides a winning noise instead of a random noise without accessing the original pipeline.
Supplementary Material: zip
Primary Area: generative models
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 3603
Loading