Building Sequence-to-Sequence Document Revision Models from Matched and Multiple Partially-Matched DatasetsDownload PDF

Anonymous

16 Jan 2022 (modified: 05 May 2023)ACL ARR 2022 January Blind SubmissionReaders: Everyone
Abstract: This paper defines the document revision task and proposes a novel modeling method that can utilize not only a matched dataset but also multiple partially-matched datasets. In the document revision task, we aim to simultaneously consider multiple perspectives for writing supports. To this end, it is important not only to correct grammatical errors but also to improve readability and perspicuity, through means such as conjunction insertion and sentence reordering. However, it is difficult to prepare enough the matched dataset for the document revision task since this task has to consider multiple perspectives simultaneously. To mitigate this problem, our idea is to utilize not only a limited matched dataset but also various partially-matched datasets that handles individual perspectives, e.g., correcting grammatical errors or inserting conjunctions. Since suitable partially-matched datasets have either been published or can easily be made, we expect to prepare a large amount of these partially-matched datasets. To effectively utilize these multiple datasets, our proposed modeling method incorporates ``on-off'' switches into sequence-to-sequence modeling to distinguish the matched datasets and individual partially-matched datasets. Experiments using our created document revision datasets demonstrate the effectiveness of the proposed method.
Paper Type: long
0 Replies

Loading