Active Learning for Abstractive Text Summarization via LLM-Determined Curriculum and Certainty Gain Maximization

Published: 01 Jan 2024, Last Modified: 21 May 2025EMNLP (Findings) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: For abstractive text summarization, laborious data annotation and time-consuming model training become two high walls, hindering its further progress. Active Learning, selecting a few informative instances for annotation and model training, sheds light on solving these issues. However, only few active learning-based studies focus on abstractive text summarization and suffer from low stability, effectiveness, and efficiency. To solve the problems, we propose a novel LLM-determined curriculum active learning framework. Firstly, we design a prompt to ask large language models to rate the difficulty of instances, which guides the model to train on from easier to harder instances. Secondly, we design a novel active learning strategy, i.e., Certainty Gain Maximization, enabling to select instances whose distribution aligns well with the overall distribution. Experiments show our method can improve stability, effectiveness, and efficiency of abstractive text summarization backbones.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview