A Simple and General Strategy for Referential Problem in Low-Resource Neural Machine TranslationDownload PDF

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Withdrawn SubmissionReaders: Everyone
Keywords: machine translation, Referential Problem, low-resource
Abstract: This paper aims to solve a series of referential problems in sequence decoding caused by data sparsity and corpus scarce in low-resource Neural Machine Translation (NMT), including pronoun missing, reference error, bias and so on. It is difficult to find the essential reason of these problems because they are only shown in the prediction results and involve all aspects of the model. Different from the usual solutions based on complex mathematical rule setting and adding artificial features, we expect to turn the problems in the predictions into noise as much as possible, and use adversarial training to make the model find the balance between the noise and the golden samples, instead of exploring the reason of the problem during the complex training. In this paper, only a simple noise-based preprocessing operation and a slight modification of the adversarial training can make the model generalize to a series of referential problems in low-resource NMT task. On Korean-Chinese, Mongolian-Chinese and Arabic-Chinese tasks, the evaluation of BLEU score and the accuracy of pronouns in sequence have been significantly improved.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Reviewed Version (pdf): https://openreview.net/references/pdf?id=ex9ywyCFsS
5 Replies

Loading