Contextual Metric Meta-Evaluation by Measuring Local Metric Accuracy

ACL ARR 2024 June Submission1025 Authors

13 Jun 2024 (modified: 15 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Meta-evaluation of automatic evaluation metrics---assessing evaluation metrics themselves---is crucial for accurately benchmarking natural language processing systems and has implications for scientific inquiry, production model development, and policy enforcement. While existing approaches to metric meta-evaluation focus on general statements about the absolute and relative quality of metrics across arbitrary system outputs, in practice, metrics are applied in highly contextual settings, often measuring the performance for a highly constrained set of system outputs. For example, we may only be interested in evaluating a specific model or class of models. We introduce a method for contextual metric meta-evaluation by comparing the \textit{local metric accuracy} of evaluation metrics. Across translation, speech recognition, and ranking tasks, we demonstrate that the local metric accuracies vary both in absolute value and relative effectiveness as we shift across evaluation contexts.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: Resources and Evaluation
Contribution Types: Model analysis & interpretability
Languages Studied: English
Submission Number: 1025
Loading