Abstract: Currently, there are only a limited number of Japanese-Chinese bilingual corpora of
a sufficient amount that can be used as training data for neural machine translation (NMT). In
particular, there are few corpora that include spoken language such as daily conversation. In this
research, we attempt to construct a Japanese-Chinese bilingual corpus of a certain scale by crawling
the subtitle data of movies and TV series from the websites. We calculated the BLEU scores of the
constructed WCC-JC (Web Crawled Corpus—Japanese and Chinese) and the other compared corpora.
We also manually evaluated the translation results using the translation model trained on the WCC-JC
to confirm the quality and effectiveness.
0 Replies
Loading