Letter N-Gram-based Input Encoding for Continuous Space Language ModelsOpen Website

2013 (modified: 16 Jul 2019)CVSM@ACL 2013Readers: Everyone
Abstract: We present a letter-based encoding for words in continuous space language models. We represent the words completely by letter n-grams instead of using the word index. This way, similar words will automatically have a similar representation. With this we hope to better generalize to unknown or rare words and to also capture morphological information. We show their influence in the task of machine translation using continuous space language models based on restricted Boltzmann machines. We evaluate the translation quality as well as the training time on a German-to-English translation task of TED and university lectures as well as on the news translation task translating from English to German. Using our new approach a gain in BLEU score by up to 0.4 points can be achieved.
0 Replies

Loading