Test-Time Scaling of Diffusion Models via Noise Trajectory Search

Vignav Ramesh; Morteza Mardani

Test-Time Scaling of Diffusion Models via Noise Trajectory Search

Vignav Ramesh, Morteza Mardani

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: diffusion model, test-time scaling, Markov decision process, contextual bandit, optimization, search, MCTS, epsilon-greedy

TL;DR: We present an algorithm for test-time scaling of SDE-based diffusion models by searching for noise trajectories which optimize arbitrary rewards, empirically matching/exceeding MCTS performance.

Abstract: The iterative and stochastic nature of diffusion models enables *test-time scaling*, whereby spending additional compute during denoising generates higher-fidelity samples. Increasing the number of denoising steps is the primary scaling axis, but this yields quickly diminishing returns. Instead optimizing the *noise trajectory*—the sequence of injected noise vectors—is promising, as the specific noise realizations critically affect sample quality; but this is challenging due to a high-dimensional search space, complex noise-outcome interactions, and costly trajectory evaluations. We address this by first casting diffusion as a Markov Decision Process (MDP) with a terminal reward, showing tree-search methods such as Monte Carlo tree search (MCTS) to be meaningful but impractical. To balance performance and efficiency, we then resort to a relaxation of MDP, where we view denoising as a sequence of independent *contextual bandits*. This allows us to introduce an $\epsilon$-greedy search algorithm that *globally explores* at extreme timesteps and *locally exploits* during the intermediate steps where de-mixing occurs. Experiments on EDM and Stable Diffusion reveal state-of-the-art scores for class-conditioned/text-to-image generation, exceeding baselines by up to $164$% and matching/exceeding MCTS performance. To our knowledge, this is the first practical method for test-time noise *trajectory* optimization of *arbitrary (non-differentiable)* rewards.

Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)

Submission Number: 26024

Loading