We propose Partitioned Learned Count-Min Sketch (PL-CMS), a new approach to learning augmented frequent item identification in data streams. Our method builds on the learned Count-Min Sketch (LCMS) algorithm of Hsu et al. (ICLR 2019), which combines a standard Count-Min Sketch frequency estimation data structure with a learned model, by partitioning items in the input stream into two sets. Items with sufficiently high predicted frequencies have their frequencies tracked exactly, while the remaining items, with low predicted frequencies, are placed into the Count-Min Sketch data structure.
Inspired by an approach of Vaidya et al. for learning augmented Bloom filters (ICLR 2021), our PL-CMS algorithm partitions items into different sets, based on multiple predicted frequency thresholds. Each set is handled by a separate Count-Min Sketch data structure. Unlike classic LCMS, this allows the algorithm to take advantage of the full prediction space of the learned model. We demonstrate that, given fixed partitioning thresholds, the parameters of our data structure can be efficiently optimized using a convex program. Empirically, we show that, on a variety of benchmarks, PL-CMS obtains a lower false positive rate for frequent item identification as compared to LCMS and standard Count-Min Sketch.