NoiseAR: AutoRegressing Initial Noise Prior for Diffusion Models

Zeming Li; Xiangyue Liu; Xiangyu Zhang; Ping Tan; Heung-Yeung Shum

NoiseAR: AutoRegressing Initial Noise Prior for Diffusion Models

Zeming Li, Xiangyue Liu, Xiangyu Zhang, Ping Tan, Heung-Yeung Shum

14 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Generative Model, Diffusion Model, Text to Image Generation

TL;DR: This paper introduces NoiseAR, a method that learns a controllable, structured initial noise for diffusion models using an autoregressive framework, leading to improved sample quality and better consistency with text prompts.

Abstract: Diffusion models have emerged as powerful generative frameworks, creating data samples by progressively denoising an initial random state. Traditionally, this initial state is sampled from a simple, fixed distribution like isotropic Gaussian, inherently lacking structure and a direct mechanism for external control. While recent efforts have explored ways to introduce controllability into the diffusion process, particularly at the initialization stage, they often rely on deterministic or heuristic approaches. These methods can be suboptimal, lack expressiveness, and are difficult to scale or integrate into more sophisticated optimization frameworks. In this paper, we introduce NoiseAR, a novel method for AutoRegressive Initial Noise Prior for Diffusion Models. Instead of a static, unstructured source, NoiseAR learns to generate a dynamic and controllable prior distribution for the initial noise. We formulate the generation of the initial noise prior's parameters as an autoregressive probabilistic modeling task over spatial patches or tokens. This approach enables NoiseAR to capture complex spatial dependencies and introduce learned structure into the initial state. Crucially, NoiseAR is designed to be conditional, allowing text prompts to directly influence the learned prior, thereby achieving fine-grained control over the diffusion initialization. Our experiments demonstrate that NoiseAR can generate initial noise priors that lead to improved sample quality and enhanced consistency with conditional inputs, offering a powerful, learned alternative to traditional random initialization. A key advantage of NoiseAR is its probabilistic formulation, which naturally supports seamless integration into probabilistic frameworks like Markov Decision Processes (MDPs) and Reinforcement Learning (RL). This integration opens promising avenues for further optimizing and scaling controllable generation for downstream tasks. Furthermore, NoiseAR acts as a lightweight, plug-and-play module, requiring minimal additional computational overhead during inference, making it easy to integrate into existing diffusion pipelines.

Supplementary Material: zip

Primary Area: generative models

Submission Number: 4997

Loading