Gumiho: A Hybrid Architecture to Prioritize Early Tokens in Speculative Decoding

Published: 2025, Last Modified: 07 Jan 2026ICML 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Loading