Keywords: language modeling, non-autoregressive text generation
TL;DR: Sequence modeling by inserting tokens at arbitrary positions
Abstract: Autoregressive models (ARMs), which generate sequences by predicting tokens from left to right, have achieved significant success across a wide range of sequence generation tasks. However, they struggle to accurately represent sequences that require satisfying sophisticated constraints or whose sequential dependencies are better addressed by out-of-order generation. Masked Diffusion Models (MDMs) address some of these limitations, but MDMs struggle to generate variable length sequences and cannot handle arbitrary infilling constraints when the number of tokens to be filled in is not known in advance.
We revisit the idea of generation by insertion and introduce Insertion Language Models (ILMs), which learn to insert tokens at arbitrary positions in a sequence---that is, they select jointly both the position and the vocabulary element to be inserted. The ability to generate sequences in arbitrary order allows ILMs to accurately model sequences where token dependencies do not follow a left-to-right sequential structure, while maintaining the ability to infill and generate up to a variable length. To train ILMs, we propose a tailored network parameterization with a single transformer encoder and use a simple denoising loss. Through empirical valuation on planning tasks we demonstrate the aforementioned failure modes of ARMs and MDMs, and show that ILMs overcome these.
Furthermore, we show that ILMs perform on par with ARMs and better than MDMs in unconditional text generation while offering greater flexibility than MDMs in arbitrary-length text infilling.
Primary Area: generative models
Submission Number: 21566
Loading