BPTree: An ℓ2 Heavy Hitters Algorithm Using Constant Memory
Abstract: The task of finding heavy hitters is one of the best known and well studied problems in the area of data streams. One is given a list i 1 , i 2 ,..., i m ∈[ n ] and the goal is to identify the items among [ n ] that appear frequently in the list. In sub-polynomial space, the strongest guarantee available is the l 2 guarantee, which requires finding all items that occur at least ε||ƒ|| 2 times in the stream, where the vector ƒ∈ R n is the count histogram of the stream with i th coordinate equal to the number of times i appears ƒ i :=#{ j ε[ m ]: i j = i . The first algorithm to achieve the l 2 guarantee was the CountSketch of [11], which requires O (ε -2 log n ) words of memory and O (log n ) update time and is known to be space-optimal if the stream allows for deletions. The recent work of [7] gave an improved algorithm for insertion-only streams, using only O (ε -2 logε -1 log log n ) words of memory. In this work, we give an algorithm BPTree for l 2 heavy hitters in insertion-only streams that achieves O (ε -2 logε -1 ) words of memory and O (logε -1 ) update time, which is the optimal dependence on n and m . In addition, we describe an algorithm for tracking ||ƒ|| 2 at all times with O (ε -2 ) memory and update time. Our analyses rely on bounding the expected supremum of a Bernoulli process involving Rademachers with limited independence, which we accomplish via a Dudley-like chaining argument that may have applications elsewhere.
0 Replies
Loading