Abstract: In this paper, a translation model based on Convolutional Neural Network (CNN) architecture is introduced into the Mongolian-Chinese translation task. Mongolian language has rich morphology structure, so we use byte-pair encoding (BPE) to segment the Mongolian word. In addition, the Mongolian Correction approach is adopted to reduce coding errors occurred in Mongolian corpus. The statistics data show that BPE and Mongolian Correction are alleviate the data sparsity that results from very low-resource Mongolian-Chinese parallel corpus. On Mongolian-Chinese translation task, we achieve the best result 35.37 BLEU that exceeds the baseline system by 1.4 BLEU. In the experiments, effect of different translation granularity on the translation result is investigated. The experiment results show that sub-word unit is more suitable than word unit for Mongolian-Chinese translation.
Loading