Adversarially Injected Diagnosis for Coherent Visual Autoregressive Generation

Ligong Bi; Tao Huang; Chang Xu

Adversarially Injected Diagnosis for Coherent Visual Autoregressive Generation

Ligong Bi, Tao Huang, Chang Xu

18 Sept 2025 (modified: 13 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Visual Autoregressive Models (VARs); Image Generation; Adversarial Learning; Generative Models

TL;DR: This paper introduces a lightweight, plug-and-play module that uses adversarial guidance to improve the global coherence of large, frozen generative models.

Abstract: Visual Autoregressive (VAR) models, despite their formidable generative capa- bilities, accumulate local prediction errors across scales, leading to detail loss and local distortions. To address this, we introduce AID-VAR, a plug-and-play method that improves pretrained VARs via Adversarially Injected Diagnosis. Inspired by GANs, we train a discriminator to detect visual errors in generated samples and use an adversarial objective to pull generations toward the manifold of real im- ages. To avoid the computational and stability issues of directly updating the VAR, we attach a lightweight guidance injector that conditions on previously gen- erated scales of a pre-trained and frozen VAR and injects adversarial features to guide the next scale. To quantify reductions in cross-scale errors, we introduce the Inter-Scale Consistency Score (ISCS), which measures the fidelity of transi- tions between consecutive scales. Across standard VAR backbones, AID-VAR delivers sharper details, fewer local distortions, and stronger global coherence at remarkably low computational cost, adding negligible parameters and minimal computational overhead. Our results establish AID-VAR as a practical pathway for upgrading large VAR generators with adversarial feedback, without modifying training data, base architecture, or sampling schedules. For instance, our AID- VAR-d20 improves FID by 16%, with only 3% parameters increase.

Primary Area: generative models

Submission Number: 11397

Loading