QuickMerge++: Token Merging with Autoregressive Prior

ICML 2025 Workshop TokShop Submission4 Authors

Published: 10 Jun 2025, Last Modified: 12 Jun 2025TokShopEveryoneRevisionsBibTeXCC BY 4.0
Archiving Submission: Yes (archival)
Keywords: Token Merging, Token Reduction, Autoregressive Learning, Efficient AI
Abstract: As generative models scale to larger inputs across language, vision, and video domains, the cost of token-level computation has become a key bottleneck. While prior work suggests that only a subset of tokens significantly influence downstream predictions, most token selection methods are static, modality-specific, or incompatible with autoregressive generation. In this paper, we propose QuickMerge, a lightweight token merging framework designed for efficient next-token prediction. QuickMerge dynamically selects a reduced number of tokens based on attention norm magnitude, guided by an entropy-based budget estimator. To preserve autoregressive compatibility, we introduce a lightweight transformer prior trained over the merged token sequence. By combining semantic salience estimation, flexible token budgets, and AR alignment, QuickMerge enables accurate generation with fewer tokens. We evaluate QuickMerge across text (WikiText), image (ImageNet), and video (UCF101) domains, demonstrating consistent improvements in compute-accuracy tradeoffs. Specifically, QuickMerge reduces token counts sustantially while matching or exceeding the performance of learned tokenizers and fixed-patch baselines.
Submission Number: 4
Loading