- Abstract: Text summarization has been a key language generation task for over 60 years. The field has advanced considerably during the past two years, benefiting from the proliferation of pre-trained Language Models (LMs). However, the field is constrained by two factors: 1) the absence of an effective automatic evaluation metric and 2) a lack of effective architectures for long document summarization. Our first contribution is to demonstrate that a set of semantic evaluation metrics (BERTScore, MoverScore and our novel metric, BARTScore) consistently and significantly outperform ROUGE. Using these metrics, we then show that combining transformers with sparse self-attention is a successful method for long document summarization and is very competitive with the state of the art. Finally, we show that sparsifying self-attention does not degrade model performance when using transformers for summarization.
- Software: zip