InsNeXt: Training Scalable Insertion-based Language Models from Scratch

Sidi Lu; Jacky Dai; Xuezhe Ma; Nanyun Peng

InsNeXt: Training Scalable Insertion-based Language Models from Scratch

Sidi Lu, Jacky Dai, Xuezhe Ma, Nanyun Peng

11 May 2025 (modified: 29 Oct 2025)Submitted to NeurIPS 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: language model, controllable generation, insertion-based model

TL;DR: Propose an insertion-based language model with modern scalability and conduct pretraining.

Abstract: Insertion-based language models like Insertion Transformer and InsNet have shown promises as strong alternatives to autoregressive models with better inference-time efficiency and controllablility. However, their training-time scalability has been limited by computational inefficiency and obsolete model designs. We aim to tackle this problem with \textbf{InsNeXt}, an insertion-based language model architecture integrating recent advancements of language model systems to achieve improved scalability. We scale InsNeXt from 154M up to as large as 0.6B parameters with context window of 4096 by combining sentence-level training and document-level training to better encode the context and bring out the benefits of insertion-based models to encode bi-directional contexts. In addition, we propose a novel context encoding mechanism specialized for insertion-based decoding. The inference-time mechanism sparsely introduces bidirectional re-encoding of context, thus effectively leverages the models' bidirectional context reception while preserving the same level of computational efficiency as conventional autoregressive decoding. We evaluate the pretrained InsNeXt models from the perspective of representation learning, commonsense reasoning and controllable generation. InsNeXt models achieve similar or better performance in comparison to the state-of-the-art similar-sized autoregressive models, making them a class of solid representation learners and powerful controllable insertion-based generators.

Supplementary Material: zip

Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)

Submission Number: 19583

Loading