MLAE: Encoder-decoder Pre-training with Non-autoregressive Modeling

Anonymous

MLAE: Encoder-decoder Pre-training with Non-autoregressive Modeling

Anonymous

17 Apr 2023ACL ARR 2023 April Blind SubmissionReaders: Everyone

Abstract: Encoder-decoder pre-training has proven successful in natural language processing. Most of the existing works on encoder-decoder pre-training are based on the autoregressive architecture. In this paper, we introduce MLAE, a new pre-training framework based on a non-autoregressive encoder-decoder architecture. It behaves like a masked autoencoder and reconstructs the masked language tokens in a non-autoregressive manner. Our model combines the best of two worlds: the advantages of the encoder-only models on the understanding tasks and the capabilities of the autoregressive encoder-decoder on the generation tasks. Extensive experiments show that MLAE outperforms strong baselines on various benchmarks, including language understanding, autoregressive generation, as well as non-autoregressive generation.

Paper Type: long

Research Area: Generation

0 Replies

Loading