Protecting Label Distribution in Cross-Silo Federated Learning

Yangfan Jiang, Xinjian Luo, Yuncheng Wu, Xiaokui Xiao, Beng Chin Ooi

Published: 01 Jan 2024, Last Modified: 25 Jan 2025SP 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Federated learning (FL) is a popular distributed machine learning (ML) framework in which multiple parties share their model parameters instead of the raw training datasets to construct a global model in a privacy-preserving manner. However, existing FL solutions mainly focus on protecting the privacy of individual training records by incorporating differential privacy (DP), while overlooking the protection of the distribution information of training datasets, despite the fact that data distribution is also regarded as highly sensitive in high-stakes applications.In this paper, we propose the first privacy-preserving stochastic gradient descent (SGD) algorithm for protecting label distribution in FL. To establish a formal privacy guarantee, we formalize a privacy notion, dubbed (m,γ,ξ)-label distributional privacy, to quantify label distributional privacy leakage. Subsequently, we design the label distribution perturbation mechanism (LDPM) that carefully incorporates randomness into the SGD algorithm to achieve (m,γ,ξ)-label distributional privacy for all one-vs-all classification models. LDPM is easy to implement and provides non-trivial privacy guarantees, making it a suitable drop-in replacement for existing FL local model training algorithms. Notably, we demonstrate that LDPM also ensures DP, indicating that LDPM offers both individual and label distributional privacy guarantees. Extensive experiments on six benchmark datasets validate the effectiveness of LDPM.