Meet in the Middle: A New Pre-training Paradigm

Published: 21 Sept 2023, Last Modified: 02 Nov 2023NeurIPS 2023 posterEveryoneRevisionsBibTeX
Keywords: language modeling, pre-training, deep learning, NLP
TL;DR: We propose a new pre-training objective for autoregressive LMs and show that models trained this way are better in perplexity and many other metrics
Abstract: Most language models (LMs) are trained and applied in an autoregressive left-to-right fashion, predicting the next token from the preceding ones. However, this ignores that the full sequence is available during training. In this paper, we introduce ``Meet in the Middle'' (MIM) a new pre-training paradigm that improves data efficiency by training in two directions, left-to-right and right-to-left, and encouraging the respective models to agree on their token distribution for each position. While the primary outcome is an improved left-to-right LM, we also obtain secondary benefits in the infilling task. There, we leverage the two pre-trained directions to propose an infilling procedure that builds the completion simultaneously from both sides. We conduct extensive experiments on both programming and natural languages and show that MIM significantly surpasses existing pre-training paradigms, in both left-to-right generation as well as infilling. Code and models available at https://github.com/microsoft/Meet-in-the-Middle
Supplementary Material: pdf
Submission Number: 8168
Loading