CLEME2.0: Towards More Interpretable Evaluation by Disentangling Edits for Grammatical Error Correction

CLEME2.0: Towards More Interpretable Evaluation by Disentangling Edits for Grammatical Error Correction

ACL ARR 2024 June Submission2628 Authors

15 Jun 2024 (modified: 13 Aug 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: The paper focuses on improving the interpretability of Grammatical Error Correction (GEC) metrics, which receives little attention in previous studies. To bridge the gap, we propose **CLEME2.0**, a reference-based evaluation strategy that can describe four elementary dimensions of GEC systems, namely hit-correction, error-correction, under-correction, and over-correction. They collectively contribute to revealing the critical characteristics and locating drawbacks of GEC systems. Evaluating systems by Combining these dimensions leads to high human consistency over other reference-based and reference-less metrics. Extensive experiments on 2 human judgement datasets and 6 reference datasets demonstrate the effectiveness and robustness of our method.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: Grammatical Error Correction, Evaluation Metric

Contribution Types: Publicly available software and/or pre-trained models, Data analysis

Languages Studied: English

Submission Number: 2628

Loading