Imposing Category Trees Onto Word-Embeddings Using A Geometric ConstructionDownload PDF

Published: 21 Dec 2018, Last Modified: 05 May 2023ICLR 2019 Conference Blind SubmissionReaders: Everyone
Abstract: We present a novel method to precisely impose tree-structured category information onto word-embeddings, resulting in ball embeddings in higher dimensional spaces (N-balls for short). Inclusion relations among N-balls implicitly encode subordinate relations among categories. The similarity measurement in terms of the cosine function is enriched by category information. Using a geometric construction method instead of back-propagation, we create large N-ball embeddings that satisfy two conditions: (1) category trees are precisely imposed onto word embeddings at zero energy cost; (2) pre-trained word embeddings are well preserved. A new benchmark data set is created for validating the category of unknown words. Experiments show that N-ball embeddings, carrying category information, significantly outperform word embeddings in the test of nearest neighborhoods, and demonstrate surprisingly good performance in validating categories of unknown words. Source codes and data-sets are free for public access \url{https://github.com/gnodisnait/nball4tree.git} and \url{https://github.com/gnodisnait/bp94nball.git}.
Keywords: category tree, word-embeddings, geometry
TL;DR: we show a geometric method to perfectly encode categroy tree information into pre-trained word-embeddings.
Code: [![github](/images/github_icon.svg) gnodisnait/nball4tree](https://github.com/gnodisnait/nball4tree) + [![Papers with Code](/images/pwc_icon.svg) 1 community implementation](https://paperswithcode.com/paper/?openreview=rJlWOj0qF7)
38 Replies

Loading