Constructing Pseudo-parallel Swedish Sentence Corpora for Automatic Text SimplificationDownload PDF

Published: 20 Mar 2023, Last Modified: 17 Apr 2023NoDaLiDa 2023Readers: Everyone
Keywords: Automatic Text Simplification, Pseudo-Parallel Corpora, Monolingual Sentence Corpora
Abstract: Automatic text simplification (ATS) describes the automatic transformation of a text from a complex form to a less complex form. Many modern ATS techniques need large parallel corpora of standard and simplified text, but such data does not exist for many languages. One way to overcome this issue is to create pseudo-parallel corpora by dividing existing corpora into standard and simple parts. In this work, we explore the creation of Swedish pseudo-parallel monolingual corpora by the application of different feature representation methods, sentence alignment algorithms, and indexing approaches, on a large monolingual corpus. The different corpora are used to fine-tune a sentence simplification system based on BART, which is evaluated with standard evaluation metrics for automatic text simplification.
Student Paper: Yes, the first author is a student
4 Replies

Loading