Multitask learning of Multilingual Sentence Representations


Nov 03, 2017 (modified: Nov 03, 2017) ICLR 2018 Conference Blind Submission readers: everyone Show Bibtex
  • Abstract: We present a novel multi-task training approach to learning multilingual distributed representations of text. Our system learns word and sentence embeddings jointly by training a multilingual skip-gram model together with a cross-lingual sentence similarity model. We construct sentence embeddings by processing word embeddings with an LSTM and by taking an average of the outputs. Our architecture can transparently use both monolingual and sentence aligned bilingual corpora to learn multilingual embeddings, thus covering a vocabulary significantly larger than the vocabulary of the bilingual corpora alone. Our model shows competitive performance in a standard cross-lingual document classification task. We also show the effectiveness of our method in a low-resource scenario.
  • TL;DR: We jointly train a multilingual skip-gram model and a cross-lingual sentence similarity model to learn high quality multilingual text embeddings that perform well in the low resource scenario.
  • Keywords: multilingual, embedding, representation learning, multi-task learning, low resource