Improving Sign-Random-Projection via Count SketchDownload PDF

Published: 20 May 2022, Last Modified: 05 May 2023UAI 2022 PosterReaders: Everyone
Keywords: Dimensionality Reduction, Sketching Algorithm, Signed Random Projection, Count-Sketch, Cosine Similarity
Abstract: Computing the angular similarity between pairs of vectors is a core part of various machine learning algorithms. The seminal work due to Charikar~\citep{simhash} (\textit{a.k.a.} Sign-Random-Projection (SRP) or SimHash) provides an unbiased estimate for the same. However, SRP suffers from the following limitations: (i) large variance in the similarity estimation, (ii) and high running time while computing the sketch. There are improved variants that address these limitations. However, they are known to improve on only one aspect in their proposal, for \textit{e.g.}~\citep{CBE} suggest a faster algorithm, ~\citep{superbit, MLE} provide estimates with a smaller variance. In this work, we propose a sketching algorithm that addresses both aspects in one algorithm -- a faster algorithm along with a smaller variance in the similarity estimation. Moreover, our algorithm is space-efficient as well. We present a rigorous theoretical analysis of our proposal and complement it via experiments on synthetic and real-world datasets.
Supplementary Material: zip
5 Replies