Probabilistic Polynomials and Hamming Nearest NeighborsDownload PDFOpen Website

Published: 2015, Last Modified: 09 May 2023FOCS 2015Readers: Everyone
Abstract: We show how to compute any symmetric Boolean function on n variables over any field (as well as '/ the integers) with a probabilistic polynomial of degree O( √nlog(1/ε)) and error at most ε. The degree dependence on n and ε is optimal, matching a lower bound of Razborov (1987) and Smolensky (1987) for the MAJORITY function. The proof is constructive: a low-degree polynomial can be efficiently sampled from the distribution. This polynomial construction is combined with other algebraic ideas to give the first subquadratic time algorithm for computing a (worst-case) batch of Hamming distances in superlogarithmic dimensions, exactly. To illustrate, let c(n) : ℕ → ℕ. Suppose we are given a database D of n vectors in {0,1} <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">c(n)logn</sup> and a collection of n query vectors Q in the same dimension. For all u ∈ Q, we wish to compute a v ∈ D with minimum Hamming distance from u. We solve this problem in n <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2-1/O(c(n)log</sup> <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">c(n))</sup> randomized time. Hence, the problem is in “truly subquadratic” time for O(logn) dimensions, and in subquadratic time for d = o((log2 n)/(loglogn) <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> ). We apply the algorithm to computing pairs with maximum inner product, closest pair in ℓ1 for vectors with bounded integer entries, and pairs with maximum Jaccard coefficients.
0 Replies

Loading