Beyond Scattered Acceptance: Fast and Coherent Inference for DLMs via Longest Stable Prefixes

Pengxiang Li; Jiayin Cai; Hongwei Xue; Kunyu Shi; Shilin Yan

Beyond Scattered Acceptance: Fast and Coherent Inference for DLMs via Longest Stable Prefixes

Pengxiang Li, Jiayin Cai, Hongwei Xue, Kunyu Shi, Shilin Yan

Published: 26 Jan 2026, Last Modified: 11 Feb 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: diffusion language model, efficent, block

Abstract: Diffusion Language Models (DLMs) promise parallel generation via iterative denoising, yet their practical speed is often throttled by \emph{schedulers} that accept scattered high-confidence tokens, fragmenting KV caches and forcing repeated local repairs. We present \emph{Prefix Absorption}, a training-free inference principle operationalized by the \emph{Longest Stable Prefix} (LSP) scheduler. In each iteration, LSP performs a single forward pass to locate the longest left-aligned run whose predictions are both high-margin and temporally stable, then snaps the candidate boundary to natural structural delimiters (e.g., punctuation or code boundaries) before atomically committing the block. This prefix-first topology preserves a single frozen/active boundary, converts KV updates into contiguous appends, and concentrates attention on a rapidly shrinking suffix. As a consequence, the active sequence length decays geometrically and the total work bends from an effectively cubic $O(N^3)$ regime toward near-quadratic $O(N^2)$ while maintaining coherence. On code generation (HumanEval, MBPP) and complex reasoning (GSM8K, GPQA) with LLaDA-8B and Dream-7B, LSP substantially reduces end-to-end latency and denoiser calls while matching or improving task quality relative to strong scattered-acceptance baselines. Ablations isolate the gains to LSP’s core components—adaptive block sizing, structural boundary snapping, and the prefix-first commitment topology—demonstrating that faster DLM inference can be achieved without retraining and is complementary to existing diffusion schedules.

Primary Area: generative models

Submission Number: 774

Loading