Encode Errors: Representational Retrieval of In-Context Demonstrations for Multilingual Grammatical Error Correction

Encode Errors: Representational Retrieval of In-Context Demonstrations for Multilingual Grammatical Error Correction

ACL ARR 2025 February Submission7512 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Grammatical Error Correction (GEC) involves detecting and correcting incorrect usage of grammar. While large language models (LLMs) with in-context learning (ICL) capabilities have shown significant progress on various natural language processing (NLP) tasks, their few-shot performance on GEC remains suboptimal. This is mainly due to the challenge of retrieving suitable in-context demonstrations that capture error patterns instead of semantic similarity. In this paper, we demonstrate that LLMs can inherently capture information related to grammatical errors through their internal states. We extract from these states the Grammatical Error Representation (GER), an informative and semantically neutral encoding of grammatical errors. Our novel GER-based retrieval method significantly boosts performance in ICL settings on multilingual GEC datasets, improving the precision of correction. For high-resource languages, our results on 8B-sized open-source models match those of closed-source models such as Deepseek2.5 and GPT-4o-mini. For low-resource languages, our $F_{0.5}$ scores surpass the baseline by a factor of 1.25. This method provides a more precise and resource-efficient solution for multilingual GEC, offering a promising direction for interpretable GEC research. The code will be available upon acceptance.

Paper Type: Long

Research Area: NLP Applications

Research Area Keywords: grammatical error correction, few-shot learning, interpretability, multilingual evaluation

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches to low-resource settings

Languages Studied: English, German, Romanian, Estonian

Submission Number: 7512

Loading