ThinkGEC: A Cognitive and Pedagogical Framework for LLM-based Grammatical Error Correction

ACL ARR 2026 January Submission4664 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: GEC, LLM
Abstract: LLMs are increasingly used for learner-facing writing support, yet grammatical error correction lacks cognitively aligned training, pedagogically curated data, and interpretable feedback. We present ThinkGEC, a three-stage framework grounded in Corder’s identification–description–explanation paradigm, comprising knowledge elicitation from expert annotations, knowledge injection via supervised fine-tuning, and explanation through GRPO-guided self-revision. To support this framework, we release De10K, an education-oriented German GEC corpus containing 2,899 essays and 14,330 expert-annotated errors across diverse topics and proficiency levels. Experiments demonstrate that ThinkGEC substantially outperforms strong baselines, improves precision, mitigates over-correction, and generalizes to held-out semantically driven error types. Further analysis investigates model scale and reinforcement design, crucially revealing the complexity-dependent efficacy of reasoning trajectories, which benefits structural repairs but proves redundant for surface-level errors. ThinkGEC delivers interpretable, pedagogically aligned rationales, advancing both the accuracy and educational value of LLM-based GEC. Our code and dataset are available at: https://anonymous.4open.science/r/ThinkGEC-04E7
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: NLP Applications,Resources and Evaluation
Contribution Types: NLP engineering experiment, Data resources
Languages Studied: German
Submission Number: 4664
Loading