Privacy Leakage via Output Label Space and Differentially Private Continual Learning

16 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: differential privacy, privacy-preserving machine learning, continual learning, image classification, pre-trained models
Abstract: Differential privacy (DP) is a formal privacy framework that enables training machine learning (ML) models while protecting individuals' data. As pointed out by prior work, ML models are part of larger systems, which can lead to so-called privacy side-channels even if the model training itself is DP. We identify the output label space of a classification model as such privacy side-channel and show a concrete privacy attack that exploits it. The side-channel becomes highly relevant in continual learning (CL) as the output label space changes over time. We propose and evaluate two methods for eliminating this side-channel: applying an optimal DP mechanism to release the labels in the sensitive data, and using a large public label space. We explore the trade-offs of these methods through adapting pre-trained models.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 7327
Loading