Label Privacy Source Coding in Vertical Federated Learning

Dashan Gao; Sheng Wan; Hanlin Gu; Lixin Fan; Xin Yao; Qiang Yang

Label Privacy Source Coding in Vertical Federated Learning

Dashan Gao, Sheng Wan, Hanlin Gu, Lixin Fan, Xin Yao, Qiang Yang

20 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX

Keywords: Label privacy, Vertical Federated learning, Gradient boosting, Adversarial training

TL;DR: We study label privacy protection in VFL by formulating an offline-phase LPSC problem to encode minimum-sufficient label privacy, and optimize LPSC via gradient boosting with privacy guarantee.

Abstract: We study label privacy protection in vertical federated learning (VFL). VFL enables an active party who possesses labeled data to improve model performance (utility) by collaborating with passive parties who have auxiliary features. Recently, there has been a growing concern for protecting label privacy against semi-honest passive parties who may surreptitiously deduce private labels from the output of their bottom models. However, existing studies do not remove the prior label information in the active party's features from labels in an offline phase, thus leaking unnecessary label privacy to passive parties. In contrast to existing methods that focus on training-phase perturbation, we propose a novel offline-phase data cleansing approach to protect label privacy without compromising utility. Specifically, we first formulate a Label Privacy Source Coding (LPSC) problem to remove the redundant label information in the active party's features from labels, by assigning each sample a new weight and label (i.e., residual) for federated training. We give a privacy guarantee and theoretically prove that gradient boosting efficiently optimizes the LPSC problem. Therefore, we propose the Vertical Federated Gradient Boosting (VFGBoost) framework to address the LPSC problem. Moreover, given that LPSC only provides upper-bounded privacy enhancement, VFGBoost further enables a flexible privacy-utility trade-off by incorporating adversarial training during federated training. Experimental results on four real-world datasets substantiate the efficacy of LPSC and the superiority of our VFGBoost framework.

Primary Area: societal considerations including fairness, safety, privacy

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 2638

Loading