Fast Inference of Visual Autoregressive Model with Adjacency-Adaptive Dynamical Draft Trees

11 Sept 2025 (modified: 19 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Speculative Decoding, Speculative Sampling, Accelerating Visual Autoregressive Models
TL;DR: PEANUT introduces dynamic draft trees to accelerate visual autoregressive models by adapting to image region complexities.
Abstract: Autoregressive (AR) models have made significant strides in image generation, delivering quality comparable to diffusion-based methods. However, their sequential inference process incurs high computational costs, hindering efficiency and scalability. Although speculative decoding has proven effective in accelerating Large Language Models (LLMs), its adaptation to visual AR models, especially for improved generation with dynamic draft trees, remains largely unexplored. In this work, we identify a key obstacle in applying speculative decoding to visual AR models: inconsistent acceptance rates across draft trees due to varying prediction difficulties in different image regions. To address this, we introduce Adjacency-Adaptive Dynamical Draft Trees, dubbed as PEANUT, which dynamically adjust draft tree depth and width by leveraging adjacent token states and prior acceptance rates. PEANUT optimizes tree construction using spatial token relationships, achieving more stable acceleration and higher acceptance rates. Evaluations on text-to-image generation show that PEANUT dramatically outperforms methods with draft tree-like EAGLE-2 in inference efficiency while preserving lossless image quality, and can also be combined with techniques such as LANTERN that relax sampling criteria.
Primary Area: generative models
Submission Number: 3944
Loading