Deep Multilingual Correlation for Improved Word Embeddings

Ang Lu, Weiran Wang, Mohit Bansal, Kevin Gimpel, Karen Livescu

2015 (modified: 16 Jul 2019)HLT-NAACL 2015Readers: Everyone

Abstract: Word embeddings have been found useful for many NLP tasks, including part-of-speech tagging, named entity recognition, and parsing. Adding multilingual context when learning embeddings can improve their quality, for example via canonical correlation analysis (CCA) on embeddingsfromtwo languages. In this paper, we extend this idea to learn deep non-linear transformations of word embeddings of the two languages, using the recently proposed deep canonical correlation analysis. The resulting embeddings, when evaluated on multiple word and bigram similarity tasks, consistently improve over monolingual embeddings and over embeddings transformed with linear CCA.

0 Replies