Keywords: maximum coverage, turnstile streams, sketching
Abstract: In the maximum coverage problem we are given $d$ subsets from a universe $[n]$, and the goal is to output at most $k$ subsets such that their union covers the largest possible number of distinct items. The input can be formalized as an $n \times d$ matrix $A$ where entry $A_{ij} \neq 0$ if item $i$ is covered by subset $j$ and $A_{ij} = 0$ otherwise. In this paper we create the first linear sketch to solve the maximum coverage problem. The sketch has size sublinear in the input and is directly applicable to distributed and streaming settings, often offering significant runtime improvements. We focus on the application to the turnstile streaming model which supports insertions and deletions. In this model, updates take the form $(i,j, \pm 1)$ which update $A_{ij}$ to $A_{ij} + 1$ or $A_{ij} - 1$, depending on the sign. Previous work has largely focused on more restrictive models, such as the set-arrival model where each update reveals an entire column of $A$, or the insertion-only model which does not allow deletions. We design an algorithm with an $\tilde{O}(d/\varepsilon^3)$ space bound for all $k \geq 0$. We note that when $k$ is constant, this space bound is nearly optimal up to logarithmic factors.
We then turn to fingerprinting for risk measurement. The input is an $n \times d$ matrix $A$ where there are $n$ users and $d$ features, and the goal is to determine which $k$ features (or columns in $A$) together pose the greatest re-identification risk. Our maximum coverage sketch directly enables a solution to targeted fingerprinting for risk measurement. Furthermore, we present a result of independent interest: a linear sketch of the complement of $F_p$, the $p^{\text{th}}$ frequency moment, for $p \geq 2$. We use this sketch to solve general fingerprinting for risk management. Empirical evaluation confirms the practicality of our fingerprinting algorithms, demonstrating a speedup of up to $210$x over prior work. We also demonstrate that our general fingerprinting algorithm can serve as a dimensionality reduction technique, with an application to facilitating enhanced feature selection efficiency.
Supplementary Material: zip
Primary Area: optimization
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 3368
Loading