Analyzing Preference Data With Local Privacy: Optimal Utility and Enhanced RobustnessDownload PDFOpen Website

Published: 01 Jan 2023, Last Modified: 04 Aug 2023IEEE Trans. Knowl. Data Eng. 2023Readers: Everyone
Abstract: Online service providers benefit from collecting and analyzing preference data from users, including both implicit preference data (e.g., watched videos of a user) and explicit preference data (e.g., ranking data over candidates). However, it brings ethical and legal issues of data privacy at the same time. In this paper, we study the problem of aggregating individual's preference data in the local differential privacy (LDP) setting. One naive approach is to add Laplace random noises, which however suffers from low statistical utility and is fragile to LDP-specific poisoning attacks. Therefore, we propose a novel mechanism to improve the utility and the robustness simultaneously: the <i>additive mechanism</i> . The additive mechanism randomly outputs a subset of candidates with a probability proportional to their total scores. For preference data with Borda rule over <inline-formula><tex-math notation="LaTeX">$d$</tex-math></inline-formula> items, its mean squared error bound is optimized from <inline-formula><tex-math notation="LaTeX">$O(\frac{d^{5}}{n\epsilon ^{2}})$</tex-math></inline-formula> to <inline-formula><tex-math notation="LaTeX">$O(\frac{d^{4}}{n\epsilon ^{2}})$</tex-math></inline-formula> , and its maximum poisoning risk bound is reduced from <inline-formula><tex-math notation="LaTeX">$+\infty$</tex-math></inline-formula> to <inline-formula><tex-math notation="LaTeX">$O(\frac{d^{2}}{n\epsilon })$</tex-math></inline-formula> . We also theoretically investigate minimax lower bounds of <inline-formula><tex-math notation="LaTeX">$\epsilon$</tex-math></inline-formula> -LDP preference data aggregation, and prove the error rate of <inline-formula><tex-math notation="LaTeX">$O(\frac{d^{4}}{n\epsilon ^{2}})$</tex-math></inline-formula> is optimal for the Borda rule. Experimental results validate that our proposed approaches averagely reduce estimation error by <inline-formula><tex-math notation="LaTeX">$50\%$</tex-math></inline-formula> and are more robust to adversarial poisoning attacks.
0 Replies

Loading