OneFlowSeq: Achieving One-Step Generation for Diffusion Language Models via Lightweight Distillation

ICLR 2026 Conference Submission585 Authors

01 Sept 2025 (modified: 23 Dec 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Diffusion Language Models; Seq2Seq
TL;DR: OneFlowSeq distills a multi-step diffusion teacher into a one-step generator via MeanFlow and Jacobian signals, achieving SOTA Seq2Seq quality with fewer params, lower cost, and orders faster inference.
Abstract: Autoregressive models dominate Seq2Seq generation but suffer from slow, error-prone token-by-token decoding. Diffusion language models (DLMs) enable parallel refinement and global coherence, yet their iterative denoising requires hundreds of steps, limiting practicality. We propose **OneFlowSeq**, a novel framework that distills a powerful multi-step diffusion teacher (LLaDA-8B-Instruct) into a one-step generator via MeanFlow-based supervision and parameter-efficient prompt tuning. Our OneFlowSeq introduces a Jacobian-vector product signal that provides richer guidance than conventional distillation, allowing the student to not only match the 128-step teacher in terms of one-step generation quality. Experiments on paraphrasing, text simplification, and question generation benchmarks show that OneFlowSeq achieves state-of-the-art performance, while reducing trainable parameters by 1600$\times$ and delivering inference speeds orders of magnitude faster than both autoregressive and multi-step diffusion baselines. This work establishes one-step diffusion as a practical and scalable paradigm for Seq2Seq generation.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 585
Loading