On Data Distribution Leakage in Cross-Silo Federated Learning

Yangfan Jiang, Xinjian Luo, Yuncheng Wu, Xiaochen Zhu, Xiaokui Xiao, Beng Chin Ooi

Published: 2024, Last Modified: 25 Jan 2025IEEE Trans. Knowl. Data Eng. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Federated learning (FL) has emerged as a promising privacy-preserving machine learning paradigm, enabling data owners to collaboratively train a joint model by sharing model parameters instead of private training data. However, recent studies reveal the privacy risks in FL by inferring private training data from model parameters. Therefore, differential privacy (DP) is incorporated into FL to safeguard training data. Nevertheless, DP does not provide a strong theoretical guarantee for protecting data distribution, which is also highly sensitive in the cross-silo FL scenarios as it may reflect the business secrets of data owners. In this article, we develop two attack methods to investigate the potential risks of data distribution leakage in differentially private cross-silo FL. We highlight that an honest-but-curious server can successfully infer both the feature and label distributions of each party’s training data without any background knowledge. Specifically, the first attack applies when models are differentiable, while the second attack caters to non-differentiable classification models. Extensive experiments on six benchmark datasets validate the effectiveness of the proposed attacks. The results demonstrate that the state-of-the-art DP-SGD algorithm is still vulnerable to the inference attack on data distribution, emphasizing the necessity of designing more advanced privacy-preserving FL frameworks.