TiBERT: A Non-autoregressive Pre-trained Model for Text Editing

Published: 01 Jan 2023, Last Modified: 19 Feb 2025NLPCC (3) 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Text editing refers to the task of creating new sentences by altering existing text through methods such as replacing, inserting, or deleting. Two commonly used techniques for text editing are Seq2Seq and sequence labeling. The Seq2Seq method can be time-consuming, while the sequence labeling method struggles with multi-token insertion. To solve these issues, we propose a novel pre-trained model called TiBERT, which is specially designed for Text Editing tasks. TiBERT addresses these challenges by adjusting the length of the hidden representation to insert and delete tokens. We pre-train our model using a denoising task on a large dataset. As a result, TiBERT provides not only fast inference but also an improvement in the quality of text generation. We test the model on grammatical error correction, text simplification, and Chinese spelling check tasks. The experimental results show that TiBERT predicts faster and achieves better results than other pre-trained models in these text editing tasks.
Loading