- Abstract: We present a new approach to defining a sequence loss function to train a summarizer by using a secondary encoder-decoder as a loss function, alleviating a shortcoming of word level training for sequence outputs. The technique is based on the intuition that if a summary is a good one, it should contain the most essential information from the original article, and therefore should itself be a good input sequence, in lieu of the original, from which a summary can be generated. We present experimental results where we apply this additional loss function to a general abstractive summarizer on a news summarization dataset. The result is an improvement in the ROUGE metric and an especially large improvement in human evaluations, suggesting enhanced performance that is competitive with specialized state-of-the-art models.
- Code: https://github.com/iclr2020recoder/code_for_paper
- Keywords: encoder-decoder, summarization, loss functions
- TL;DR: We present the use of a secondary encoder-decoder as a loss function to help train a summarizer.
- Original Pdf: pdf