DIVERSED: Relaxed Speculative Decoding via Dynamic Ensemble Verification

Published: 16 Oct 2025, Last Modified: 26 Nov 2025NeurIPS 2025 ER WorkshopEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Speculative Decoding, Relaxed verification
Abstract: Speculative decoding accelerates LLM inference by letting a small draft model propose multiple tokens that a larger target verifies in parallel, but rigid verification that enforces exact distributional match rejects many plausible tokens and limits speed. We first introduce Static Ensemble, a training‑free fixed‑weight mixture of draft and target that provably traces the Pareto‑optimal trade‑off between rejection probability and distributional bias. To further raise acceptance without sacrificing quality, we propose **Diversed** (DynamIc VErification RElaxed SpEculative Decoding), which learns context‑dependent mixing weights to form a flexible verification target. This relaxed verification admits safe tokens more often while preserving correctness. Theory and experiments show that **Diversed** achieves significantly higher inference efficiency than conventional speculative decoding and the static baseline.
Submission Number: 165
Loading