DIET-PATE: Knowledge Transfer in PATE without Public Data

Michel Meintz; Adam Dziedzic; Franziska Boenisch

DIET-PATE: Knowledge Transfer in PATE without Public Data

Michel Meintz, Adam Dziedzic, Franziska Boenisch

Published: 04 Mar 2025, Last Modified: 17 Apr 2025ICLR 2025 Workshop SynthDataEveryoneRevisionsBibTeXCC BY 4.0

Keywords: PATE, programmatically generated data, data free knowledge distillation

TL;DR: We propose a solution to the problem of using public data in PATE by discovering a synergy between programmatically generated data and data free knowledge distillation.

Abstract: The PATE algorithm is one of the canonical approaches to private machine learning. It leverages a private dataset to label a public dataset, enabling knowledge transfer from teachers to a student model under differential privacy guarantees. However, PATE's reliance on public data from the same distribution as the private data poses a fundamental limitation, particularly in domains such as healthcare and finance, where such public data is typically unavailable. In this work, we propose DIET-PATE which overcomes this limitation by identifying a synergy between programmatically generated data and data-free knowledge distillation. The programmatically generated data serves two critical purposes: first, pretraining both the teacher ensemble and the student model on this data significantly enhances overall performance, as it removes the need to learn generic feature representations solely from the private dataset. Second, by substituting for the public dataset during knowledge transfer, it entirely removes the need for in-distribution data. To correct the resulting distributional shift in the models' hidden layer activations, we incorporate data-free knowledge distillation, which aligns these activations and ensures reliable knowledge transfer. Our experiments demonstrate that DIET-PATE closely matches the performance of standard PATE, despite the absence of in-distribution public data. Furthermore, we show that our approach seamlessly extends to a distributed setting, where each teacher model is trained by a different entity. By eliminating its need for public data, we make PATE and its distributed derivatives practically applicable to sensitive domains.

Submission Number: 15

Loading