In-batch Ensemble Drafting: Robust Speculative Decoding for LVLMs

Minjae Lee; Wonjun Kang; Byeongkeun Ahn; Christian Classen; Minghao Yan; Hyung Il Koo; Kangwook Lee

In-batch Ensemble Drafting: Robust Speculative Decoding for LVLMs

Minjae Lee, Wonjun Kang, Byeongkeun Ahn, Christian Classen, Minghao Yan, Hyung Il Koo, Kangwook Lee

Published: 05 Mar 2025, Last Modified: 14 Apr 2025SCOPE - ICLR 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Track: Main paper track (up to 5 pages excluding references and appendix)

Keywords: Speculative decoding, Multimodal Large Language Models, Large Vision Language Models

TL;DR: Systematically explore speculative decoding for LVLMs and propose new method in-batch ensemble drafting

Abstract: Despite the success of Speculative Decoding (SD) in LLM inference acceleration, it largely remains unexplored for Large Vision Language Models (LVLMs), an advanced class of LLMs that can handle multimodal prompts consisting of text and image tokens. To bridge this gap, we first conduct a comprehensive benchmarking study, focusing on the effectiveness of various drafting methods. We observe that various drafting methods have their own advantages, and none of them consistently outperforms the others. Motivated by this observation, we propose **In-batch Ensemble Drafting (IbED)**, a simple yet effective SD method for LVLMs. IbED leverages multiple drafting methods without incurring much additional latency via batch inference and, compared to multimodal drafting, consistently demonstrates significant improvements in block efficiency, averaging 6% (with a maximum of 23%) across a wide range of datasets.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Submission Number: 24

Loading