Abstract: Pre-training method has been proved to significantly improve the performance of low-resource neural machine translation (NMT), while the common pre-training methods (BERT) uses attention mechanism based on Levenshtein distance (LD) to extract language features, which ignored syntax-related information. In this paper, we proposed a machine translation pre-training method with semantic perception which depend on the traditional position-based modeling, we uses semantic role labels (SRL) to annotate sentences with “predicate-argument” structures at the word level, and merge vectorized SRL with word vectors to deepen the model’s understanding of deep semantics. In addition, to avoid parameter disaster, we proposed a hierarchical knowledge distillation method to fuse the NMT model and pre-training model to adapt to the output probability distribution of the pre-training model. We validated the method in the LDC En-Zh and CCMT2017 Mongolia-Chinese (Mo-Ch), Uyghur-Chinese (Uy-Ch), Tibetan-Chinese (Ti-Ch) tasks. The results show that compared with baseline, our model achieves significant results, which fully illustrates the generalization of the method.
Loading