Using a Pre-Trained Language Model for Context-Aware Error Detection and Correction in Persian language
Abstract: This paper presents a Persian spell checker called Virastman, which aims to detect and correct non-word and real-word errors in a sentence. A state-of-the-art method based on sequence labeling with BERT detects real-word errors on a small artificially made dataset. An unsupervised model based on BERT is used for correcting errors by calculating the probability of each candidate in a sentence (including the detected word). A highly probable candidate word is selected as the correct word if some conditions are met based on two thresholds named α and β. Our experiments across six distinct test sets underscore our proposed methodology's notable superiority in detecting and correcting real-word and non-word errors compared to the baselines. More specifically, our approach demonstrates an average enhancement of 3.41% in error detection and an average substantial 15% in error correction when assessed using the F0.5 metric, thus surpassing contemporary baselines, establishing our method as the state-of-the-art for error detection and correction.
Paper Type: long
Research Area: NLP Applications
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches to low-resource settings, Publicly available software and/or pre-trained models
Languages Studied: English, Persian
Consent To Share Submission Details: On behalf of all authors, we agree to the terms above to share our submission details.
0 Replies
Loading