Abstract: Gradient Boosting Decision Trees (GBDTs) are popular machine learning models due to its simplicity, effectiveness, and interpretability. Recently, to alleviate serious privacy leakages in conventional centralized methods, researchers have proposed several privacy-preserving distributed GBDT solutions. However, those approaches still suffer from either insufficient privacy protection or significant runtime and communication overhead. In this paper, we propose an efficient and end-to-end privacy-preserving distributed GBDT framework, called PPD- GBDT, which uses differential privacy, polynomial approximation, and fully homomorphic encryption to achieve comprehensive privacy protection. Specifically, during the boosting phase, we design a novel model preparation method to improve the efficiency of prediction with acceptably slight accuracy/RMSE loss while preventing data owners’ corruption. On the other hand, for the prediction phase, we propose a customized secure prediction method, which effectively prevents the malicious server from stealing private information. Besides, we conduct extensive experiments on six datasets and compare with three prior schemes. Evaluation results show that our privacy-preserving scheme achieves lower runtime and up to 40× less communication overhead compared to the state-of-the-arts.
Loading