MLAE: Encoder-decoder Pre-training with Non-autoregressive Modeling Download PDF

Anonymous

17 Feb 2023 (modified: 05 May 2023)ACL ARR 2023 February Blind SubmissionReaders: Everyone
Abstract: Encoder-decoder pre-training has proven successful in natural language processing. Most of the existing works on encoder-decoder pre-training are based on the autoregressive architecture. In this paper, we introduce MLAE, a new pre-training framework based on a non-autoregressive encoder-decoder architecture. It behaves like a masked autoencoder and reconstructs the masked language tokens in a non-autoregressive manner. Our model combines the best of two worlds: the advantages of the encoder-only models on the understanding tasks and the capabilities of the autoregressive encoder-decoder on the generation tasks. Extensive experiments show that MLAE outperforms strong baselines on various benchmarks, including language understanding, autoregressive generation, as well as non-autoregressive generation.
Paper Type: long
Research Area: Generation
0 Replies

Loading