A Continuous Time Markov Chain Framework for Insertion Language Models

Published: 03 Feb 2026, Last Modified: 06 Feb 2026AISTATS 2026 SpotlightEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: This paper presents a unifying perspective on insertion language models through the lens of denoising continuous-time Markov chains.
Abstract: Sequence generation through insertion of tokens offers several advantages over left-to-right generation and mask-based generation—especially for planning tasks that require look-ahead, and for tasks that need to satisfy relative ordering constraints. Existing formulations of insertion-based generation have largely been ad-hoc. In this paper, we derive a diffusion-style denoising objective from first principles by formulating the noising process as a continuous-time Markov chain—defined over the space of variable-length sequences—that drops tokens uniformly with a time-dependent rate. We show that, with certain approximations, previous formulations of insertion language models can be viewed as special cases of this denoising framework. We propose new network parameterizations that explicitly model the rate matrix of the generative Markov chain, leading to principled sampling procedures. Through empirical evaluation on a synthetic planning task, we show that the proposed approach retains the benefits of insertion-based generation over left-to-right generation and masked diffusion models. In language modeling, our diffusion-based approach is competitive with left-to-right generation and masked diffusion models, while offering additional flexibility in sampling compared to existing insertion language models.
Submission Number: 396
Loading