End-to-End Privacy-Preserving Vertical Federated Learning using Private Cross-Organizational Data Collaboration

27 May 2024 (modified: 26 Aug 2024)Submitted to FedKDD 2024EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Vertical Federated Learning
Abstract: As data utilization in organizations is advancing in various fields, insights that data brings will be more diverse when it is sourced through collaboration across different organizations than from a single organization. However, such data collaboration amongst organizations raises an issue of privacy protection. Federated learning, a method of building a machine learning (ML) model with distributed data across organizations, protects privacy by sharing only the model parameters and the information necessary for model update, without having to share the data each organization holds. On the other hand, it has been pointed out that data used for training may be leaked even from just the gradient necessary for model updates. To prevent such privacy leakage, local differential privacy can be applied where noise is added to the gradient to be shared in the model training in each organization. However, there is a problem with local differential privacy, where the amount of noise increases, leading to the degradation in model accuracy. In this paper, we propose a method of reducing the impact of noise compared to conventional federated learning by leveraging private cross-organizational data collaboration, called Private Cross-aggregation Technology (PCT). PCT combines Private Set Intersection Cardinality, Trusted Execution Environment and Differential Privacy, and outputs a cross-tabulation table that is private from input to output. Our method consists of two steps: (1) creating a private cross-tabulation table using PCT, and (2) training a ML using the private cross-tabulation table. In our implementation, we train a Naive Bayes classifier as an ML model. To confirm the effectiveness of the proposed method, we conducted an accuracy evaluation of the classification problem using DP-SGD, which is a safe learning method for deep learning used in federated learning, and the proposed method. We confirmed that the classification accuracy of the proposed method is higher in situations where the privacy budget is limited.
Submission Number: 4
Loading