Learning Representations for Faster Similarity Search

Ludwig Schmidt, Kunal Talwar

Feb 15, 2018 (modified: Feb 15, 2018) ICLR 2018 Conference Blind Submission readers: everyone Show Bibtex
  • Abstract: In high dimensions, the performance of nearest neighbor algorithms depends crucially on structure in the data. While traditional nearest neighbor datasets consisted mostly of hand-crafted feature vectors, an increasing number of datasets comes from representations learned with neural networks. We study the interaction between nearest neighbor algorithms and neural networks in more detail. We find that the network architecture can significantly influence the efficacy of nearest neighbor algorithms even when the classification accuracy is unchanged. Based on our experiments, we propose a number of training modifications that lead to significantly better datasets for nearest neighbor algorithms. Our modifications lead to learned representations that can accelerate nearest neighbor queries by 5x.
  • TL;DR: We show how to get good representations from the point of view of Simiarity Search.