Syntactic Relevance XLNet Word Embedding Generation in Low-Resource Machine TranslationDownload PDF

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Withdrawn SubmissionReaders: Everyone
Keywords: XLNet, Word Embedding, Machine Translation, Low resource
Abstract: Semantic understanding is an important factor affecting the quality of machine translation of low resource agglutination language. The common methods(sub-word modeling, pre-training word embedding, etc.) will increase the length of the sequence, which leads to a surge in computation. At the same time, the pre-training word embedding with rich context is also the precondition to improve the semantic understanding. Although BERT uses masked language model to generate dynamic embedding in parallel, however, the fine-tuning strategy without mask will make it inconsistent with the training data, which produced human error. Therefore, we proposed a word embedding generation method based on improved XLNet, it corrects the defects of the BERT model, and improves the sampling redundancy issues in traditional XLNet. Experiments are carried out on CCMT2019 Mongolian-Chinese, Uyghur-Chinese and Tibetan-Chinese tasks, the results show that the generalization ability and BLEU scores of our method are improved compared with the baseline, which fully verifies the effectiveness of the method.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Reviewed Version (pdf): https://openreview.net/references/pdf?id=qsnsPsUMIe
5 Replies

Loading