Keywords: language modeling, pre-training, deep learning, NLP
TL;DR: We propose a new pre-training objective for autoregressive LMs and show that models trained this way are better in perplexity and many other metrics
Abstract: Most language models (LMs) are trained and applied in an autoregressive left-to-right fashion, predicting the next token from the preceding ones. However, this ignores that the full sequence is available during training.
In this paper, we introduce ``Meet in the Middle'' (MIM) a new pre-training paradigm that improves data
efficiency by training in two directions, left-to-right and right-to-left, and encouraging the respective models
to agree on their token distribution for each position. While the primary outcome is an improved left-to-right LM,
we also obtain secondary benefits in the infilling task. There, we leverage the two pre-trained directions to propose an infilling procedure that builds the completion simultaneously from both sides. We conduct extensive experiments on both programming and natural languages and show that MIM significantly surpasses existing pre-training paradigms, in both left-to-right generation as well as infilling.
Code and models available at https://github.com/microsoft/Meet-in-the-Middle
Supplementary Material: pdf
Submission Number: 8168
Loading