Noise-Efficient Learning of Differentially Private Partitioning Machine Ensembles

Zhanliang Huang, Yunwen Lei, Ata Kabán

Published: 01 Jan 2022, Last Modified: 14 May 2023ECML/PKDD (4) 2022Readers: Everyone

Abstract: Differentially private decision tree algorithms have been popular since the introduction of differential privacy. While many private tree-based algorithms have been proposed for supervised learning tasks, such as classification, very few extend naturally to the semi-supervised setting. In this paper, we present a framework that takes advantage of unlabelled data to reduce the noise requirement in differentially private decision forests and improves their predictive performance. The main ingredients in our approach consist of a median splitting criterion that creates balanced leaves, a geometric privacy budget allocation technique, and a random sampling technique to compute the private splitting-point accurately. While similar ideas existed in isolation, their combination is new, and has several advantages: (1) The semi-supervised mode of operation comes for free. (2) Our framework is applicable in two different privacy settings: when label-privacy is required, and when privacy of the features is also required. (3) Empirical evidence on 18 UCI data sets and 3 synthetic data sets demonstrate that our algorithm achieves high utility performance compared to the current state of the art in both supervised and semi-supervised classification problems.

0 Replies