Variational Learning for Insertion-based Generation

Published: 30 Apr 2026, Last Modified: 24 Jun 2026ICML 2026 spotlightEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We introduce Insertion Process (IP), a probabilistic framework for insertion-based sequence generation that jointly learns where to insert, what to insert, and when to terminate.
Abstract: Non-monotonic sequence generation methods, such as masked diffusion models, provide a flexible alternative to left-to-right autoregressive modeling by allowing tokens to be generated in non-fixed and prescribed orders. Despite their practical advantages, most existing non-monotonic models are order-agnostic and rely on a fixed-length grid, limiting their ability to support variable-length generation and adaptive insertion order. In this work, we introduce a probabilistic framework for learning insertion order in variable-length insertion models. We formalize a bijective correspondence between insertion trajectories and permutations, which enables an exact reparameterization of the data likelihood as a sum over permutations. Building on this result, we propose the Insertion Process (IP), a stochastic generative model that jointly learns where to insert, what to insert, and when to terminate, trained via permutation-based variational inference. Unlike prior fixed-canvas approaches, IP natively supports variable-length generation and learns data-driven preferences over insertion orders. Experiments on goal-conditioned planning and molecular string generation demonstrate that learning insertion order improves both modeling quality and generalization in domains without a canonical left-to-right structure.
Lay Summary: The **Insertion Process (IP)** is a new AI framework that builds sequences from the inside out, inserting information wherever needed instead of strictly left-to-right. Standard AI models generate data strictly left-to-right, which is inefficient for complex tasks like structured planning or biological design. Other models try non-linear generation but require a fixed canvas - meaning the exact output length must be known in advance. The IP model learns three things simultaneously: where to insert a token, what token to insert, and when to stop. By using a mathematical correspondence between insertion trajectories and permutations , it naturally handles variable-length outputs without a pre-set canvas. We evaluated the IP on two benchmarks: 1) _Maze planning_: IP vastly outperformed traditional linear models in navigating complex grid mazes and graphs. 2) _Molecular sequence generation_: When generating chemical codes (SMILES strings), IP learned without human prompting to build a molecule's structural skeleton first before inserting atoms. It also successfully connected fragmented molecules in linker design tasks with near 100% chemical validity.
Primary Area: Probabilistic Methods->Variational Inference
Keywords: Generative Modeling, Variational Inference, Discrete Diffusion, Autoregressive Model, Molecule Generation
Originally Submitted PDF: pdf
Submission Number: 25754
Loading