A Few Good Sentences: Content Selection for Abstractive Text Summarization

Vivek Srivastava, Savita Bhat, Niranjan Pedanekar

Published: 01 Jan 2023, Last Modified: 11 Oct 2025ECML/PKDD (4) 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: ive text summarization has been of research interest for decades. Neural approaches, specifically recent transformer-based methods, have demonstrated promising performance in generating summaries with novel words and paraphrases. In spite of generating more fluent summaries, these approaches may yet show poor summary-worthy content selection. In these methods, the extractive content selection is majorly dependent on the reference summary with little to no focus on identifying the summary-worthy segments (SWORTS) in a reference-free setting. In this work, we leverage three metrics, namely, informativeness, relevance, and redundancy in selecting the SWORTS. We propose a novel topic-informed and reference-free method to rank the sentences in the source document based on their importance. We demonstrate the effectiveness of SWORTS selection in different settings such as fine-tuning, few-shot tuning, and zero-shot abstractive text summarization. We observe that self-training and cross-training a pre-trained model with SWORTS selected data shows competitive performance to the pre-trained model. Furthermore, a small amount of SWORTS selected data is sufficient for domain adaptation against fine-tuning on the entire training dataset with no content selection. In contrast to training a model on the source dataset with no content selection, we observe a significant reduction in the time required to train a model with SWORTS that further underlines the importance of content selection for training an abstractive text summarization model.

External IDs:dblp:conf/pkdd/SrivastavaBP23