Keywords: Visual Autoregressive Models (VARs); Image Generation; Adversarial Learning; Generative Models
TL;DR: This paper introduces a lightweight, plug-and-play module that uses adversarial guidance to improve the global coherence of large, frozen generative models.
Abstract: Visual Autoregressive (VAR) models, despite their formidable generative capa-
bilities, accumulate local prediction errors across scales, leading to detail loss and
local distortions. To address this, we introduce AID-VAR, a plug-and-play method
that improves pretrained VARs via Adversarially Injected Diagnosis. Inspired by
GANs, we train a discriminator to detect visual errors in generated samples and
use an adversarial objective to pull generations toward the manifold of real im-
ages. To avoid the computational and stability issues of directly updating the
VAR, we attach a lightweight guidance injector that conditions on previously gen-
erated scales of a pre-trained and frozen VAR and injects adversarial features to
guide the next scale. To quantify reductions in cross-scale errors, we introduce
the Inter-Scale Consistency Score (ISCS), which measures the fidelity of transi-
tions between consecutive scales. Across standard VAR backbones, AID-VAR
delivers sharper details, fewer local distortions, and stronger global coherence at
remarkably low computational cost, adding negligible parameters and minimal
computational overhead. Our results establish AID-VAR as a practical pathway
for upgrading large VAR generators with adversarial feedback, without modifying
training data, base architecture, or sampling schedules. For instance, our AID-
VAR-d20 improves FID by 16%, with only 3% parameters increase.
Primary Area: generative models
Submission Number: 11397
Loading