Keywords: differential privacy, auditing, metagradient optimization
Abstract: In this work we study black-box privacy auditing, where the goal is to lower bound the privacy parameter
of a differentially private learning algorithm using only the algorithm’s outputs (i.e., final trained model).
For DP-SGD (the most successful method for training differentially private deep learning models), the
canonical approach auditing uses membership inference—an auditor comes with a small set of special “ca-
nary” examples, inserts a random subset of them into the training set, and then tries to discern which of
their canaries were included in the training set (typically via a membership inference attack). The audi-
tor’s success rate then provides a lower bound on the privacy parameters of the learning algorithm. Our
main contribution is a method for optimizing the auditor’s canary set to improve privacy auditing, leverag-
ing recent work on metagradient optimization. Our empirical evaluation demonstrates that by
using such optimized canaries, we can improve empirical lower bounds for differentially private image
classification models by over 2x in certain instances. Furthermore, we demonstrate that our method is
transferable and efficient: canaries optimized for non-private SGD with a small model architecture remain
effective when auditing larger models trained with DP-SGD.
Supplementary Material: zip
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 19138
Loading