Abstract: Summarization models have achieved impressive benchmark performance in recent years, sharing common strengths and weaknesses. In this work, we focus on source-based features that affect most summarization models. We show that documents have specific properties that influence summarization performance. Therefore, we ask the question: can we predict a document’s summarization performance without actually generating a summary? We introduce PreSumm, a system designed to predict how well a general summarization system would perform on summarizing a certain document.
Surprisingly, PreSumm demonstrates a high correlation with human evaluations with respect to automatic metrics, supporting the hypothesis that certain global document features consistently affect model performance across systems.
We further demonstrate the model’s utility to enable efficient hybrid systems and to filter outliers and noise from datasets.
Overall, our findings underscore the importance of source-text-driven factors in summarization performance and offer insights into the limitations of current systems that could serve as the basis for future improvements.
Paper Type: Long
Research Area: Summarization
Research Area Keywords: extractive summarisation, abstractive summarisation
Contribution Types: Data analysis
Languages Studied: English
Submission Number: 2265
Loading