Abstract: We consider a new private variant of the Source-Target Clustering (STC) setting, which was introduced by de Mathelin et al. (2022). In STC, there is a target dataset that needs to be clustered by selecting centers, in addition to centers that are already provided in a separate source dataset. The goal is to select centers from the target, such that the target clustering cost given the additional source centers is minimized. We consider private STC, in which the source dataset is private and should only be used under the constraint of differential privacy. This is motivated by scenarios in which the existing centers are private, for instance because they represent individuals in a social network. We derive lower bounds for the private STC objective, illustrating the theoretical limitations on worst-case guarantees for this setting. We then present a differentially private algorithm with asymptotically advantageous results under a data-dependent analysis, in which the guarantee depends on properties of the dataset, as well as more practical variants. We demonstrate in experiments the reduction in clustering cost that is obtained by our practical algorithms compared to baseline approaches.
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=zS1JbXZQMV&referrer=%5BAuthor%20Console%5D(%2Fgroup%3Fid%3DTMLR%2FAuthors%23your-submissions)
Changes Since Last Submission: From previous submission: Fonts were fixed to match TMLR template
From previous revision of current submission: fixed local issues pointed out by second reviewer.
From previous revision to current revision: replaced asymptotic notations with constants, used Gaussian mechanism for first algorithm instead of Laplace mechanism, added several clarifications following reviewer comments.
Assigned Action Editor: ~Naman_Agarwal1
Submission Number: 2812
Loading