C2T: Classifier-based Token Tree Construction in Speculative Decoding

C2T: Classifier-based Token Tree Construction in Speculative Decoding

ICLR 2026 Conference Submission16509 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Token Tree, Speculative Decoding, Inference

TL;DR: We propose a classifier-based speculative decoding token tree construction method that significantly improves token tree accuracy, as validated across multiple models and benchmarks.

Abstract: With the increasing scale of Large Language Models (LLMs), issues of inference latency and computational costs have become increasingly prominent. Speculative decoding methods have emerged to alleviate these challenges, but existing tree construction strategies exhibit inefficiencies in accurately preparing candidate token trees for the verification stage. To address this, we propose a plug-and-play method named C2T that leverages a lightweight three-feature classifier with only 241 parameters to dynamically generate and pre-prune token trees, which is even applicable to early stopping in token sequence inference. Our approach outperforms traditional probability-based dynamic token tree construction methods while introducing negligible computational overhead. We evaluated our method on multiple benchmarks and models and showed that, when combined with SOTA methods such as EAGLE-2/3, it can reduce the number of candidate tokens by 25% without sacrificing acceptance length, resulting in a 7% to 17% speedup across models of different sizes.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 16509

Loading