CovScore: Evaluation of Multi-Document Abstractive Title Set Generation

CovScore: Evaluation of Multi-Document Abstractive Title Set Generation

ACL ARR 2024 June Submission1732 Authors

14 Jun 2024 (modified: 07 Aug 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: This paper introduces CovScore, an automatic reference-less methodology for evaluating thematic title sets, extracted from a corpus of documents. While such extraction methods are widely used, evaluating their effectiveness remains an open question. Moreover, some existing practices heavily rely on slow and laborious human annotation procedures. Inspired by recently introduced LLM-based judge methods, we propose a novel methodology that decomposes quality into five main metrics along different aspects of evaluation. This framing simplifies and expedites the manual evaluation process and enables automatic and independent LLM-based evaluation. As a test case, we apply our approach to a corpus of Holocaust survivor testimonies, motivated both by its relevance to title set extraction and by this pursuit's moral significance. We validate the methodology by experimenting with naturalistic and synthetic title set generation systems and compare their performance with the methodology.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: corpus creation, benchmarking, automatic creation and evaluation of language resources, NLP datasets, automatic evaluation of datasets, evaluation methodologies, evaluation, metrics, statistical testing for evaluation

Contribution Types: Model analysis & interpretability, Data analysis

Languages Studied: English

Submission Number: 1732

Loading