Unifying Autoregressive and Discrete Diffusion Language Modeling via Cross-Regressive Decoding

Published: 02 Mar 2026, Last Modified: 02 Mar 2026ICLR 2026 Workshop MM Intelligence PosterEveryoneRevisionsCC BY 4.0
Track: long paper (up to 8 pages)
Keywords: generative models, energy based, sequence modeling, non-autoregressive, MPC, discrete diffusion, language modeling, control theory, speculative decoding
Abstract: Autoregressive language models are trained to generate text one token at a time, causing inference latency and cost to scale linearly with output length. However, modern large language models often exhibit semi-autoregressive predictive capabilities, frequently aided by speculative decoding or other multi-token prediction methods. In contrast, discrete diffusion models promise parallel text generation but fundamentally struggle to model sequential correlations due to their reliance on mean-field approximations, ignoring the causality inherent in natural language. We introduce $\textbf{Cross-Regression}$, an approach aimed at achieving true hybridization of autoregressive and discrete diffusion sequence modeling. Cross-Regression using a parallel $\textit{predictive stream}$ coupled to exact causal probabilities from a $\textit{control stream}$. At inference time, Cross-Regression computes proposal and verification signals jointly in a single shared forward pass, using residual-energy acceptance to early-accept multiple tokens and a residual correction step to avoid discarding computation after mismatches. The method provides an explicit knob between $\textit{lossless sampling}$ and a faster $\textit{lossy regime}$ with controllable deviation. Across models from 1.5B to 70B parameters, we observe strong scaling of acceptance length and realize $\(3\)–\(6\times\)$ speedups with near-complete quality retention across reasoning, code, and dialogue benchmarks, and we demonstrate $\textbf{modality transfer}$ by accelerating Whisper decoding.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 80
Loading