Differentially Private Analysis for Binary Response Models: Optimality, Estimation, and Inference

Ce Zhang; Yixin Han; Yafei Wang; Xiaodong Yan; Linglong Kong; Ting Li; Bei Jiang

Differentially Private Analysis for Binary Response Models: Optimality, Estimation, and Inference

Ce Zhang, Yixin Han, Yafei Wang, Xiaodong Yan, Linglong Kong, Ting Li, Bei Jiang

Published: 01 May 2025, Last Modified: 24 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Randomized response (RR) mechanisms constitute a fundamental and effective technique for ensuring label differential privacy (LabelDP). However, existing RR methods primarily focus on the response labels while overlooking the influence of covariates and often do not fully address optimality. To address these challenges, this paper explores optimal LabelDP procedures using RR mechanisms, focusing on achieving optimal estimation and inference in binary response models. We first analyze the asymptotic behaviors of RR binary response models and then optimize the procedure by maximizing the trace of the Fisher Information Matrix within the $\varepsilon$- and $(\varepsilon,\delta)$-LabelDP constraints. Our theoretical results indicate that the proposed methods achieve optimal LabelDP guarantees while maintaining statistical accuracy in binary response models under mild conditions. Furthermore, we develop private confidence intervals with nominal coverage for statistical inference. Extensive simulation studies and real-world applications confirm that our methods outperform existing approaches in terms of precise estimation, privacy protection, and reliable inference.

Lay Summary: In today's data-driven world, it’s essential to protect people’s sensitive information, like personal survey answers, while still allowing scientists to make useful conclusions. One common way to do this is through “randomized response,” a technique that introduces intentional noise to protect individual answers. But the challenge is: how do we still make accurate conclusions from such noisy data? Our paper presents a new method that balances privacy and accuracy more effectively. We design an improved system that works especially well when the responses are “yes or no” (binary), and we show that it performs better than older methods, especially in real-world tasks like detecting plagiarism in student surveys. We also create a way to give researchers confidence intervals — a key tool in statistics — even when the data is privatized. This work helps ensure privacy doesn't come at the cost of scientific reliability, making it valuable for fields like health, education, and social science.

Primary Area: Social Aspects->Privacy

Keywords: Binary response model, Differential privacy, Inference, Optimality, Randomized response

Submission Number: 5188

Loading