Abstract: After years of development, Neural Machine
Translation (NMT) has produced richer translation results
than ever over various language pairs, becoming a new
machine translation model with great potential. For the NMT
model, it can only translate words/characters contained in
the training data. One problem on NMT is handling of the
low-frequency words/characters in the training data. In this
paper, we propose a method for removing characters whose
frequencies of appearance are less than a given minimum
threshold by decomposing such characters into their components and/or pseudo-characters, using the Chinese character
decomposition table we made. Experiments of Japanese-to-Chinese and Chinese-to-Japanese NMT with ASPEC-JC
(Asian Scientific Paper Excerpt Corpus, Japanese-Chinese)
corpus show that the BLEU scores, the training time and
the number of parameters are varied with the number of
the given minimum thresholds of decomposed characters.
0 Replies
Loading