NECAT-CLWE: A Simple But Efficient Parallel Data Generation Approach for Unsupervised and Semi-Supervised Neural Machine TranslationDownload PDF

Published: 08 Apr 2022, Last Modified: 05 May 2023AfricaNLP 2022Readers: Everyone
Keywords: Parallel Data Generation, Named Entity Recognition, Cross-lingual Words Embedding, Neural Machine Translation, Machine Translation
Abstract: Many languages lack sufficient data to train qualitative translation systems, particularly those based on the cutting-edge neural machine translation architectures. Recently, it has been demonstrated that using an exact copy of the monolingual target data as the source data improves the quality of translation systems, allowing them to benefit from proper nouns and such similar words that do not require translation. However, using an exact copy of the target data contaminates the source data with terms in the target language that needs translation. As a result, we describe in this paper a similar but more effective parallel data generation approach for improving low-resource neural machine translation using named entity copying and approximate translations using cross-lingual word embedding (NECAT-CLWE). The work will be evaluated on the low resource English-Hausa neural machine translation.
1 Reply

Loading