Mitigating the Impact of Reference Quality on Evaluation of Summarization Systems with Reference-Free Metrics

Mitigating the Impact of Reference Quality on Evaluation of Summarization Systems with Reference-Free Metrics

ACL ARR 2024 June Submission3315 Authors

16 Jun 2024 (modified: 02 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Automatic metrics are used as proxies to evaluate abstractive summarization systems when human annotations are too expensive. To be useful, these metrics should be fine-grained, show a high correlation with human annotations, and ideally be independant of reference quality; however, most standard evaluation metrics for summarization are reference-based, and existing reference-free metrics correlates poorly with relevance, especially on summaries of longer documents. In this paper, we introduce a reference-free metric that correlates well with human evaluated relevance, while being very cheap to compute. We show that this metric can also be used along reference-based metrics to improve their robustness in low quality reference settings.

Paper Type: Short

Research Area: Resources and Evaluation

Research Area Keywords: evaluation, metrics, abstractive summarisation

Languages Studied: english

Submission Number: 3315

Loading