Keywords: approximate nearest neighbor search, product quantization, sorting transformation
Abstract: For vector data, a sorting transformation sorts the elements of a vector by permuting the locations of its elements. It can be shown that among all permutation transformations, the sorting transformation minimizes L2 distance and maximizes similarity measures such as cosine similarity and Pearson correlation for vector data. Applying sorting transformation with vector/product quantization can substantially reduce compression errors when the same codebook size is applied, with a mild storage overhead for saving the sorting permutations for each compressed vector. For nearest neighbor search, the sorting transformation may produce false positive nearest neighbors when directly applied with the product quantization search algorithm. This problem was initially addressed via a re-ranking approach. In this work, we gave more in depth analysis to show that the re-ranking approach may be inadequate, especially for evenly distributed data. To address this problem, we adapted the lookup table approach for nearest neighbor search using sorting transformation based product quantization and showed its effectiveness on both real and simulated data.
Submission Number: 15
Loading