Overview of NLPCC 2025 Shared Task 5: Chinese Government Text Correction with Knowledge Bases

Published: 16 Nov 2025, Last Modified: 10 Jan 2026NLPCC 2025EveryoneCC BY-NC-ND 4.0
Abstract: In recent years, the field of Chinese text error correction has advanced rapidly, with machine learning-based correction algorithms significantly improving performance. However, existing research often overlooks the integration of Knowledge Bases (KBs) to guide error correction, despite their potential value in rectifying critical factual errors or dynamically adjusting correction results based on KB updates. In this paper, we develop specialized KBs and datasets for the automatic text error correction of Chinese government documents. The datasets are built upon authentic news corpora, real-world user inputs and their needs. Furthermore, we present KB-oriented metrics to evaluate text correction performance on knowledge-related terms. We test the baseline performances of several Large Language Models (LLMs), including Deepseek, Qwen, GLM, and Baichuan, for their exceptional language understanding and reasoning capabilities, and then report the performances and methods of five systems participating in the shared task.
Loading