DistillBeam: Multi-Trajectory Knowledge Distillation for Efficient Speculative Decoding

DistillBeam: Multi-Trajectory Knowledge Distillation for Efficient Speculative Decoding

ACL ARR 2026 January Submission8506 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: speculative decoding, knowledge distillation, multi-trajectory supervision, inference acceleration, large language models

Abstract: The efficacy of speculative decoding(SD) is fundamentally constrained by the alignment between the draft and target models. Existing distillation approaches for SD rely on single-trajectory supervision, which induces exposure bias and degrades acceptance rates at inference time. To address this, we introduce \textbf{DistillBeam}, a framework that optimizes draft-target alignment via multi-trajectory distillation. By aggregating supervision from multiple high-probability teacher trajectories, DistillBeam approximates the target model's full structural support, thereby mitigating sequence drift. We further tackle the prohibitive storage overhead of multi-beam distillation by demonstrating that aggressive Top-$K$ truncation ($K=50$) reduces offline storage by 99.9\% without degrading alignment. Extensive evaluation across 20 languages reveals that DistillBeam achieves wall-clock speedups of 35-65\% over autoregressive decoding, with particularly strong gains in morphologically rich languages where baseline methods struggle.

Paper Type: Long

Research Area: LLM Efficiency

Research Area Keywords: Efficient/Low-Resource Methods for NLP, Generation, Machine Translation

Contribution Types: NLP engineering experiment, Approaches low compute settings-efficiency

Languages Studied: French, Spanish, German, Portuguese, Italian, Chinese, Japanese, Korean, Arabic, Turkish, Hindi, Bengali, Tamil, Urdu, Telugu, Kannada, Malayalam, Marathi, Gujarati, Punjabi

Submission Number: 8506

Loading