Abstract: Encoder-decoder pretraining has proven successful in natural language processing. Most of the existing works on encoder-decoder pretraining are based on the autoregressive architecture. In this paper, we introduce MLAE, a new pretraining framework based on a non-autoregressive encoder-decoder architecture. It behaves like a masked autoencoder and reconstructs the masked language tokens in a non-autoregressive manner. Our model combines the best of two worlds: the advantages of the encoder-only models on the understanding tasks and the capabilities of the autoregressive encoder-decoder on the generation tasks. Extensive experiments show that MLAE outperforms strong baselines on various benchmarks, including language understanding, autoregressive generation, as well as non-autoregressive generation.
Paper Type: long
Research Area: Generation
0 Replies
Loading