Theory of Autoregressive Diffusion Model: Inference Complexity and Conditional Dependency Learning

Theory of Autoregressive Diffusion Model: Inference Complexity and Conditional Dependency Learning

ICLR 2026 Conference Submission16971 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: auto-regressive models, diffusion models, condition capturing

Abstract: Autoregressive (AR) diffusion models have recently attracted significant attention for their ability to generate high-quality, diverse samples across various tasks involving text, image, and video generation. Despite this surge of interest, the theoretical underpinnings of AR diffusion remain largely unexplored. This work, for the first time, investigates the inference complexity and underlying mechanisms behind AR diffusion's strong performance. Building on the sequential patch-by-patch generation paradigm, we formalize the inference process as a series of stage-wise conditional distribution samplings. This formulation yields that, when conditional components are learned accurately, the resulting approximation to the full joint distribution becomes highly precise. Our theoretical analysis establishes the AR diffusion inference complexity bound for a general number of stages $K$, requiring only minimal smoothness assumptions on the score functions and their estimation error. The complexity includes an additional factor proportional to the number of stages, reflecting the model's sequential architecture. On the other hand, we show that this stage-wise design can be advantageous for learning specific conditional dependencies between patches, which may be overlooked by conventional diffusion models that focus primarily on joint distributions. Subsequent experiments on synthetic data validate this theoretical insight.

Primary Area: probabilistic methods (Bayesian methods, variational inference, sampling, UQ, etc.)

Submission Number: 16971

Loading