Compressing Transformer-Based Sequence to Sequence Models With Pre-trained Autoencoders for Text SummarizationDownload PDF

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone
Keywords: Transformer, Automatic Text Summarization, sequence-to-sequence, Compression
Abstract: We proposed a technique to reduce the decoder’s number of parameters in a sequence to sequence (seq2seq) architecture for automatic text summarization. This approach uses a pre-trained AutoEncoder (AE) trained on top of a pre-trained encoder to reduce the encoder’s output dimension and allow to significantly reduce the size of the decoder. The ROUGE score is used to measure the effectiveness of this method by comparing four different latent space dimensionality reductions: 96%, 66%, 50%, 44%. A few well-known frozen pre-trained encoders (BART, BERT, and DistilBERT) have been tested, paired with the respective frozen pre-trained AEs to test the reduced dimension latent space’s ability to train a 3-layer transformer decoder. We also repeated the same experiments on a small transformer model that has been trained for text summarization. This study shows an increase of the R-1 score by 5% while reducing the model size by 44% using the DistilBERT encoder, and competitive scores for all the other models associated to important size reduction.
One-sentence Summary: Proposing an approach to compress transformer-based sequence-to-sequence models' encoder latent representation with minimal loss.
10 Replies

Loading