STAR: Speculative Decoding with Searchable Drafting and Target-Aware Refinement for Multimodal Generation

18 Sept 2025 (modified: 05 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Speculative decoding, Vision-language model
Abstract: Speculative decoding (SD) has proven to be an effective technique for accelerating autoregressive generation in large language models (LLMs), however its application to vision-language models (VLMs) remains relatively unexplored. We propose~\textit{STAR}, a novel SD framework designed specifically for fast and efficient decoding in VLMs. STAR leverages a neural architecture search (NAS) framework with target-aware supernet training to automatically identify both the optimal interaction strategy between the draft and target models, and the most suitable draft model architecture for the underlying hardware implementation platform. STAR additionally incorporates adaptive intermediate feature distillation, guided by attention entropy, to enable efficient draft training. Experiments on a range of well-established VLMs, including LLaVA series, Pixtral, and SmolVLM, demonstrate that STAR achieves up to a $3.8\times$ speedup compared to standard decoding approaches and significantly outperforms existing SD baselines in both inference throughput and speculative acceptance length across a wide spectrum of VLMs.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 13979
Loading