Communication Efficient and Differentially Private Logistic Regression under the Distributed Setting
Abstract: We study the classic machine learning problem of logistic regression with differential privacy (DP), under the distributed setting. While logistic regression with DP has been extensively studied in the literature, most of the research is focused on the centralized setting, where a centralized server is trusted with the entire private training dataset. However, in many real-world scenarios (e.g., federated learning), the data is distributed among multiple clients who may not trust others, including clients and the server. While the server tries to learn a model using the clients' private datasets, the clients should provide each individual record in their local datasets with a formal privacy guarantee.Towards this end, we propose a general mechanism for logistic regression with DP under the distributed setting, based on output perturbation. We show that our solution satisfies differential privacy and enjoys privacy amplification by secure aggregation, a recent technique for DP under the distributed setting. In addition, our solution also incurs much lower communication costs (which is considered as a huge overhead in federated learning), compared with existing ones. In particular, our solution requires the clients to communicate only once throughout the entire FL process. Finally, we provide experimental results on real-world datasets to demonstrate the effectiveness of our solution.
Loading