Continuous Speculative Decoding for Autoregressive Image Generation

17 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Speculative decoding, Visual Autoregressive Models
TL;DR: This paper propose Continuous Speculative Decoding to accelerate the inference speed of continuous visual autoregressive models while maintaining the generation quality.
Abstract: Continuous visual autoregressive (AR) models have demonstrated promising performance in image generation. However, the heavy autoregressive inference burden imposes significant overhead. In Large Language Models (LLMs), speculative decoding has effectively accelerated discrete autoregressive inference. However, the absence of an analogous theory for continuous distributions precludes its use in accelerating continuous AR models. To fill this gap, this work presents continuous speculative decoding, and addresses challenges from: 1) low acceptance rate, caused by inconsistent output distribution between target and draft models, and 2) modified distribution without analytic expression, caused by complex integral. To address challenge 1), we introduce an approximate acceptance criterion to address the inefficiency in sampling. Furthermore, we propose denoising trajectory alignment and token pre-filling strategies. To address challenge 2), we introduce acceptance-rejection sampling algorithm with an appropriate upper bound, thereby avoiding explicitly calculating the integral. Furthermore, our denoising trajectory alignment is also reused in acceptance-rejection sampling, effectively avoiding repetitive diffusion model inference. Extensive experiments demonstrate that our proposed continuous speculative decoding achieves over $2\times$ speedup on off-the-shelf models, while maintaining the generation quality.
Primary Area: generative models
Submission Number: 8689
Loading