TaskSum: Task-Driven Extractive Text Summarization for Long News Documents Based on Reinforcement Learning

Published: 2022, Last Modified: 11 Nov 2025DASFAA (3) 2022EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: A popular and state-of-the-art family of extractive summarization is to explore pre-trained language models through reinforcement learning (RL). Despite gaining promising results, existing RL-based methods suffer from three drawbacks. First, they often adopt sparse reward signal schemes, which only give rewards to some of the extracted sentences, and result in neglecting salient sentences. Second, they often deem summarization as an independent task and neglect the latent connections existing between summarization and other downstream tasks, that could provide insightful hints to guide the upstream summarization task in return. Third, the length of input sequences in most summarization methods is restricted by the utilized pre-trained language models. To address these problems, we propose a novel RL-based Seq2Seq extractive summarization model, namely TaskSum, which combines extractive text summarization with multiple associated tasks via a dense reward signal scheme. Moreover, we implement a BERT-based hierarchical encoder to effectively encode documents of arbitrary length. Empirical results demonstrate that TaskSum can overcome the above-mentioned drawbacks of existing RL-based summarization methods and achieve significantly better results for long documents.
Loading