Multi-scale Autoregressive Models are Laplacian, Discrete, and Latent Diffusion Models In Disguise

Published: 23 Sept 2025, Last Modified: 23 Dec 2025SPIGM @ NeurIPSEveryoneRevisionsBibTeXCC BY 4.0
Keywords: VAR, DDPM, multi-scale, iterative refinement, diffusion
Abstract: We revisit Visual Autoregressive (VAR) models through the lens of iterative refine- ment. Instead of viewing VAR solely as next-scale autoregression, we formalise a deterministic forward process that builds a Laplacian-like latent pyramid and a learned backward process that predicts residual code maps in a small number of coarse-to-fine steps. This perspective connects VAR to denoising diffusion, clari- fies where supervision enters, and isolates three design choices that may explain efficiency and fidelity: operating in a compact latent space, casting prediction as discrete classification over code indices, and partitioning the task by spatial fre- quency. Using small, controlled MNIST surrogates with matched budgets, we test these hypotheses and observe consistent trends favoring latent refinement, discrete targets, and two-stage coarse-to-fine specialisation. We also discuss how the same iterative-refinement template extends to permutation-invariant graph generation and to probabilistic, ensemble-style medium-range weather forecasting. The framework suggests practical ways to transfer tools from diffusion to VAR while keeping the few-step, scale-parallel generation that makes VAR appealing.
Submission Number: 125
Loading