Corpus and unsupervised benchmark: Towards Tagalog grammatical error correction

Nankai Lin, Hongbin Zhang, Menglan Shen, Yu Wang, Shengyi Jiang, Aimin Yang

Published: 2025, Last Modified: 17 Dec 2024Comput. Speech Lang. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Highlights•We construct the first Tagalog GEC evaluation corpus.•Our unsupervised GEC framework is independent of any data annotations.•Our proposed pseudo-perplexity scoring method evaluates a sentence’s likely validity.•Experimental results on two corpora verify the effectiveness of the proposed model.