Abstract: In the maximum coverage problem we are given $d$ subsets from a universe $[n]$, and the goal is to output $k$ subsets such that their union covers the largest possible number of distinct items. We present the first algorithm for maximum coverage in the turnstile streaming model, where updates which insert or delete an item from a subset come one-by-one. Notably our algorithm only uses $poly\log n$ update time. We also present turnstile streaming algorithms for targeted and general fingerprinting for risk management where the goal is to determine which features pose the greatest re-identification risk in a dataset. As part of our work, we give a result of
independent interest: an algorithm to estimate the complement of the $p^{\text{th}}$ frequency moment of a vector for $p \geq 2$. Empirical evaluation confirms the practicality of our fingerprinting algorithms demonstrating a speedup of up to $210$x over prior work.
Lay Summary: Modern datasets often involve enormous amounts of information arriving over time — think of monitoring website visits, social media posts, or network activity. In these settings, it's crucial to make fast, smart decisions using limited memory. This paper tackles the classic maximum coverage problem: imagine you're allowed to pick a few groups (like news sources) to follow, and you want to cover as many different topics (items) as possible. We design the first algorithm that solves this problem in a challenging setting where the data is constantly changing — items can be added or removed from groups — and the algorithm must update its decisions quickly and with very little memory.
Beyond this, we apply our techniques to a timely and important task: figuring out which features in a dataset could make individuals easy to identify. This is vital for protecting privacy in fields like healthcare or finance. Our approach leads to a massive speedup — up to 210 times faster — compared to previous solutions.
Link To Code: https://drive.google.com/drive/folders/1B5-HdBGnvOjrze37Yj1uvHZCuFpmqTqp?usp=sharing
Primary Area: Theory->Optimization
Keywords: maximum coverage, linear sketches, turnstile streams, algorithms
Submission Number: 8987
Loading