Exploring Different Granularity in Mongolian-Chinese Machine Translation Based on CNN

Hongbin Wang, Hongxu Hou, Jing Wu, Jinting Li, Wenting Fan

Published: 2017, Last Modified: 21 May 2025PDCAT 2017EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In this paper, a translation model based on Convolutional Neural Network (CNN) architecture is introduced into the Mongolian-Chinese translation task. Mongolian language has rich morphology structure, so we use byte-pair encoding (BPE) to segment the Mongolian word. In addition, the Mongolian Correction approach is adopted to reduce coding errors occurred in Mongolian corpus. The statistics data show that BPE and Mongolian Correction are alleviate the data sparsity that results from very low-resource Mongolian-Chinese parallel corpus. On Mongolian-Chinese translation task, we achieve the best result 35.37 BLEU that exceeds the baseline system by 1.4 BLEU. In the experiments, effect of different translation granularity on the translation result is investigated. The experiment results show that sub-word unit is more suitable than word unit for Mongolian-Chinese translation.