Analyzing and Evaluating Correlation Measures in NLG Meta-Evaluation

ACL ARR 2024 June Submission4950 Authors

16 Jun 2024 (modified: 21 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: The correlation between NLG automatic evaluation metrics and human evaluation is the most critical criterion for assessing the capability of an evaluation metric. However, different grouping methods and choices of correlation coefficients result in at least 12 types of correlation measures. For a long time, little has been known about their characteristics. Therefore, this paper illustrates the relationships between different correlation measures and demonstrates how the degree of data discretization affects their values through statistical simulations. Additionally, we designed algorithms to evaluate the discriminative power and ranking consistency of 12 correlation measures using empirical data from 6 datasets and 32 evaluation metrics, uncovering many interesting conclusions.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: statistical testing for evaluation, metrics, evaluation methodologies, evaluation
Contribution Types: Data analysis
Languages Studied: English
Submission Number: 4950
Loading