Private Set Union with Multiple Contributions

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 spotlightEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Differential Privacy, Set Union, Partition Selection
TL;DR: We study the the limits of differentially private set union (or partition selection) when each user can contribute multiple items.
Abstract: In the private set union problem each user owns a bag of at most $k$ items (from some large universe of items), and we are interested in computing the union of the items in the bags of all of the users. This is trivial without privacy, but a differentially private algorithm must be careful about reporting items contained in only a small number of bags. We consider differentially private algorithms that always report a subset of the union, and define the utility of an algorithm to be the expected size of the subset that it reports. Because the achievable utility varies significantly with the dataset, we introduce the *utility ratio*, which normalizes utility by a dataset-specific upper bound and characterizes a mechanism by its lowest normalized utility across all datasets. We then develop algorithms with guaranteed utility ratios and complement them with bounds on the best possible utility ratio. Prior work has shown that a single algorithm can be simultaneously optimal for all datasets when $k=1$, but we show that instance-optimal algorithms do not exist when $k>1$, and characterize how performance degrades as $k$ grows. At the same time, we design a private algorithm that achieves the maximum possible utility, regardless of $k$, when the item histogram matches a prior prediction (for instance, from a previous data release) and degrades gracefully with the $L_\infty$ distance between the prediction and the actual histogram when the prediction is imperfect.
Primary Area: Social and economic aspects of machine learning (e.g., fairness, interpretability, human-AI interaction, privacy, safety, strategic behavior)
Submission Number: 17733
Loading