A Data Augmentation Method Based on Sub-tree Exchange for Low-Resource Neural Machine Translation

Chuncheng Chi, Fuxue Li, Hong Yan, Hui Guan, Zhongchao Zhao

Published: 01 Jan 2023, Last Modified: 16 Jun 2025ICIC (4) 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Neural machine translation (NMT) has recently gained a lot of attention due to its ability to provide highly accurate translations. Despite its promising potential, NMT is confronted with a major hurdle in the form of insufficient training data, which can adversely affect translation performance, particularly in languages with low-resources. This is a major obstacle as it hinders the applicability of NMT across diverse domains. To alleviate this issue, a novel data augmentation (DA) method has been proposed to expand training set. It utilizes pseudo-parallel sentence pairs generated by exchanging sub-tree and back-translation to enrich the diversity of the training samples. The effectiveness of the proposed method has been validated through a series of experiments. Both simulated and real low-resource translation tasks were used to evaluate the performance of the method. The results show that the proposed method outperforms other DA methods and significantly improves translation quality beyond that of the strong baseline.