Divide-and-Conquer Text Simplification by Scalable Data EnhancementDownload PDF

Anonymous

16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone
Abstract: Text simplification, whose aim is to reduce reading difficulty, can be decomposed into four discrete rewriting operations: substitution, deletion, reordering, and splitting. However, due to a large distribution discrepancy between existing training data and human-annotated data, models may learn improper operations, thus lead to poor generalization capabilities. In order to bridge this gap, we propose a novel data enhancement method, Simsim, that generates training pairs by simulating specific simplification operations. Experiments show that the models trained with Simsim outperform multiple strong baselines and achieve the better SARI on the Turk and Asset datasets. The newly constructed dataset Simsim is available at *.
0 Replies

Loading