Semantic Hashing with Locality Sensitive EmbeddingsDownload PDF

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone
Keywords: Semantic Hashing, Approximate Nearest Neighbor
Abstract: Semantic hashing methods have been explored for learning transformations into binary vector spaces. These learned binary representations may then be used in hashing based retrieval methods, typically by retrieving all neighboring elements in the Hamming ball with radius 1 or 2. Prior studies focus on tasks with a few dozen to a few hundred semantic categories at most, and it is not currently well known how these methods scale to domains with richer semantic structure. In this study, we focus on learning embeddings for the use in exact hashing retrieval, where Approximate Nearest Neighbor search comprises of a simple table lookup. We propose similarity learning methods in which the optimized similarity is the angular similarity (the probability of collision under SimHash.) We demonstrate the benefits of these embeddings on a variety of domains, including a coocurrence modelling task on a large scale text corpus; a rich structure of which cannot be handled by a few hundred semantic groups.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
One-sentence Summary: We extend semantic hashing methods to problems with substantial observation noise and to the exact hashing retrieval case; applied to large scale text the method discovers hash clusters for words that are meaningful and outperform baselines.
Supplementary Material: zip
Reviewed Version (pdf): https://openreview.net/references/pdf?id=g-GOdZap0N
12 Replies

Loading