PILLAR: How to make semi-private learning more effective

Published: 07 Mar 2024, Last Modified: 07 Mar 2024SaTML 2024EveryoneRevisionsBibTeX
Keywords: Differential Privacy, Semi-Supervised learning, PAC learning, low dimensions
Abstract: In Semi-Supervised Semi-Private (SP) learning, the learner has access to both public unlabelled and private labelled data. We propose PILLAR, an easy-to-implement and computationally efficient algorithm that, under mild assumptions on the data, provably achieves significantly lower private labelled sample complexity and can be efficiently run on real-world datasets. The key idea is to use public data to estimate the principal components of the pre-trained features and subsequently project the private dataset onto the top-$k$ Principal Components. We empirically validate the effectiveness of our algorithm in a wide variety of experiments under tight privacy constraints $\epsilon < 1$ and probe its effectiveness in low-data regimes and when the pre-training distribution significantly differs from the one on which SP learning is performed. Despite its simplicity, our algorithm exhibits significantly improved performance, in all of these setting, over all available baselines that use similar amounts of public data while often being more computationally expensive [1]-[3]. For example, in the case of CIFAR-100 for $\epsilon=0.1$, our algorithm improves over the most competitive baselines by a factor of at least two.
Submission Number: 87
Loading