TL;DR: This work studies the problem of inference-time alignment of diffusion generative models with downstream objectives
Abstract: In this work, we focus on the alignment problem of diffusion models with a continuous reward function, which represents specific objectives for downstream tasks, such as increasing darkness or improving the aesthetics of images. The central goal of the alignment problem is to adjust the distribution learned by diffusion models such that the generated samples maximize the target reward function. We propose a novel alignment approach, named Direct Noise Optimization (DNO), that optimizes the injected noise during the sampling process of diffusion models. By design, DNO operates at inference-time, and thus is tuning-free and prompt-agnostic, with the alignment occurring in an online fashion during generation. We rigorously study the theoretical properties of DNO and also propose variants to deal with non-differentiable reward functions. Furthermore, we identify that naive implementation of DNO occasionally suffers from the out-of-distribution reward hacking problem, where optimized samples have high rewards but are no longer in the support of the pretrained distribution. To remedy this issue, we leverage classical high-dimensional statistics theory to an effective probability regularization technique. We conduct extensive experiments on several important reward functions and demonstrate that the proposed DNO approach can achieve state-of-the-art reward scores within a reasonable time budget for generation.
Lay Summary: AI image generators, like diffusion models, often feel like playing the lottery—you enter a prompt and wait, hoping the result matches what you imagined. Many people have to try again and again with different random seeds until they get a satisfying image. This trial-and-error process can be frustrating, time-consuming, and inefficient.
What if we could automate this “lottery” and make the first image good enough? Our research shows this is possible—if we have a way to measure image quality automatically. We introduce a new method, called Direct Noise Optimization (DNO), that tweaks the internal randomness during image generation to maximize a reward signal—like how beautiful, dark, or safe the image looks. Unlike other methods, ours doesn’t require retraining the model or large datasets. It runs on regular hardware and works with any prompt.
This means anyone—from artists to engineers—can generate better images with fewer tries, saving time and compute. Our method helps make AI image tools more reliable and controllable, pushing them closer to everyday creative use.
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Link To Code: https://github.com/TZW1998/Direct-Noise-Optimization
Primary Area: Deep Learning->Generative Models and Autoencoders
Keywords: Diffusion Models, Inference-Time Alignment, Optimization, RLHF
Submission Number: 7098
Loading