Esoteric Language Models

ICLR 2026 Conference Submission14645 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Diffusion Language Models, discrete diffusion
TL;DR: Eso-LMs are a new hybrid language model that unifies autoregressive and masked diffusion modeling, unlocks full KV caching for fast inference, and achieves a new state of the art on the generation speed–quality Pareto frontier.
Abstract: Diffusion-based language models offer a compelling alternative to autoregressive (AR) models by enabling parallel and controllable generation. Among this family of models, Masked Diffusion Models (MDMs) achieve the strongest performance but still underperform AR models in perplexity and lack key inference-time efficiency features—most notably, KV caching. In this work, we introduce Eso-LMs, a new family of models that fuses AR and MDM paradigms, enabling smooth interpolation between their perplexities while overcoming their respective limitations. Crucially, we introduce KV caching for MDMs while preserving parallel generation, significantly improving inference efficiency. Combined with an optimized sampling schedule, our method achieves a new state of the art on the speed-quality Pareto frontier for unconditional generation. On long contexts, our method achieves 14−65× faster inference than standard MDMs and 3−4× faster inference than prior semi-autoregressive approaches.
Primary Area: generative models
Submission Number: 14645
Loading