BART-light: One Decoder Layer Is EnoughDownload PDF

Anonymous

17 Sept 2021 (modified: 05 May 2023)ACL ARR 2021 September Blind SubmissionReaders: Everyone
Abstract: BART (Lewis et al., 2020), an encoder-decoder transformer language model (LM), has reached state-of-the-art results on several tasks in natural language generation and understanding. Similar to other pretrained encoder-decoder LMs, it uses the same number of hidden layers in the encoder and the decoder. In this paper, we show that one can easily remove all but one or two decoder layers for text generation tasks and even remove the whole decoder for classification tasks, with little to no compromises on performance. Our study presents that a shallow decoder is sufficient for most tasks when a deep encoder is used.
0 Replies

Loading