Keywords: VAR, DDPM, multi-scale, iterative refinement, diffusion
Abstract: We revisit Visual Autoregressive (VAR) models through the lens of iterative refine-
ment. Instead of viewing VAR solely as next-scale autoregression, we formalise
a deterministic forward process that builds a Laplacian-like latent pyramid and a
learned backward process that predicts residual code maps in a small number of
coarse-to-fine steps. This perspective connects VAR to denoising diffusion, clari-
fies where supervision enters, and isolates three design choices that may explain
efficiency and fidelity: operating in a compact latent space, casting prediction as
discrete classification over code indices, and partitioning the task by spatial fre-
quency. Using small, controlled MNIST surrogates with matched budgets, we test
these hypotheses and observe consistent trends favoring latent refinement, discrete
targets, and two-stage coarse-to-fine specialisation. We also discuss how the same
iterative-refinement template extends to permutation-invariant graph generation and
to probabilistic, ensemble-style medium-range weather forecasting. The framework
suggests practical ways to transfer tools from diffusion to VAR while keeping the
few-step, scale-parallel generation that makes VAR appealing.
Submission Number: 125
Loading