LEARNING SEMANTIC WORD RESPRESENTATIONS VIA TENSOR FACTORIZATION

Anonymous

Nov 03, 2017 (modified: Nov 03, 2017) ICLR 2018 Conference Blind Submission readers: everyone Show Bibtex
  • Abstract: Many state-of-the-art word embedding techniques involve factorization of a cooccurrence based matrix. We aim to extend this approach by studying word embedding techniques that involve factorization of co-occurrence based tensors (N- way arrays). We present two new word embedding techniques based on tensor factorization and show that they outperform common methods when used for several semantic NLP tasks when trained with the same data. To train one of the embeddings, we present a new joint tensor factorization problem and an approach for solving it. Furthermore, we modify the performance metrics for the Outlier Detection Camacho-Collados & Navigli (2016) task to measure the quality of higher-order relationships that a word embedding captures. Our tensor-based methods significantly outperform existing methods at this task when using our new metric. Finally, we demonstrate that vectors in our embeddings can be composed multiplicatively to create different vector representations for each meaning of a polysemous word in a way that cannot be done with other common embeddings. We show that this property stems from the higher order information that the vectors contain, and thus is unique to our tensor based embeddings.
  • Keywords: Word Embeddings, Tensor Factorization, Natural Language Processing

Loading