Abstract: This paper describes the submissions of the Tencent minority-mandarin translation system for CCMT19. We participate in 3 translation directions including Uighur\(\rightarrow \)Chinese, Tibetan\(\rightarrow \)Chinese and Mongolian\(\rightarrow \)Chinese. Our systems are neural machine translation systems trained with our improved Marian, and are called TenTrans, which are based on Google’s Transformer model architecture. We also adopt most techniques that have been proven effective recently in academia, such as back-translation based sampling, data selection, sequence-level knowledge distillation, ensemble distillation, model ensembling and reranking. By using the above technologies, our submitted systems achieve a stable performance improvement.
Loading