Distributional Inclusion Vector Embedding for Unsupervised Hypernymy Detection


Nov 03, 2017 (modified: Dec 12, 2017) ICLR 2018 Conference Blind Submission readers: everyone Show Bibtex
  • Abstract: Modeling hypernymy, such as poodle is-a dog, is an important generalization aid to many NLP tasks, such as entailment, relation extraction, and question answering. Supervised learning from labeled hypernym sources, such as WordNet, limit the coverage of these models, which can be addressed by learning hypernyms from unlabeled text. Existing unsupervised methods either do not scale to large vocabularies or yield unacceptably poor accuracy. This paper introduces {\it distributional inclusion vector embedding (DIVE)}, a simple-to-implement unsupervised method of hypernym discovery via per-word non-negative vector embeddings which preserve the inclusion property of word contexts. In experimental evaluations more comprehensive than any previous literature of which we are aware---evaluating on 11 datasets using multiple existing as well as newly proposed scoring functions---we find that our method provides up to double the precision of previous unsupervised methods, and the highest average performance, using a much more compact word representation, and yielding many new state-of-the-art results. In addition, the meaning of each dimension in DIVE is interpretable, which leads to a novel approach on word sense disambiguation as another promising application of DIVE.
  • TL;DR: We propose a novel unsupervised word embedding which preserves the inclusion property in the context distribution and achieve state-of-the-art results on unsupervised hypernymy detection
  • Keywords: unsupervised word embedding, unsupervised hypernym detection, distributional inclusion hypothesis, non-negative matrix factorization, word sense disambiguation, hypernym scoring functions