Ensembling and Score-Based Filtering in Sentence Alignment for Automatic Simplification of German Texts
Abstract: Among the well-known accessibility services for audiovisual media are subtitling for the deaf and hard-of-hearing, audio description, and sign language interpreting. More recently, automatic text simplification has emerged as a topic in the context of media accessibility, with research often approaching the task as a case of (sentence-based) monolingual machine translation. This approach relies on large amounts of high-quality parallel data, which is why monolingual sentence alignment has gained momentum. Alignment for text simplification is a complex task, with alignments often taking the form of n:m (in contrast to the standard case of 1:1 in machine translation). In this contribution, we evaluate the performance of different alignment methods against a human-created gold standard of standard German/simplified German sentence alignments created from a number of parallel corpora. Two of the corpora contain multiple levels of simplification.
Loading