Word embedding re-examined: is the symmetrical factorization optimal?Download PDF

25 Sep 2019 (modified: 24 Dec 2019)ICLR 2020 Conference Blind SubmissionReaders: Everyone
  • Original Pdf: pdf
  • Abstract: As observed in previous works, many word embedding methods exhibit two interesting properties: (1) words having similar semantic meanings are embedded closely; (2) analogy structure exists in the embedding space, such that ''emph{Paris} is to \emph{France} as \emph{Berlin} is to \emph{Germany}''. We theoretically analyze the inner mechanism leading to these nice properties. Specifically, the embedding can be viewed as a linear transformation from the word-context co-occurrence space to the embedding space. We reveal how the relative distances between nodes change during this transforming process. Such linear transformation will result in these good properties. Based on the analysis, we also provide the answer to a question whether the symmetrical factorization (e.g., \texttt{word2vec}) is better than traditional SVD method. We propose a method to improve the embedding further. The experiments on real datasets verify our analysis.
  • Code: https://www.dropbox.com/sh/5d5j4pthcgzutdf/AABUvZPJpxUo8ugff1gQ7fIQa?dl=0
  • Keywords: word embedding, matrix factorization, linear transformation, neighborhood structure
7 Replies