Abstract: Differential Privacy (DP) is the de facto standard for reasoning about the privacy of a training algorithm. Yet, learning with DP often yields poor performance unless one trains on a large dataset. In this paper, we instead outline how training on less data can be beneficial when we are only interested in defending against specific attacks; we take the canonical example of defending against membership inference. To arrive at this result, we first derive (tight) bounds on the success of all membership inference attacks. These bounds do not replace DP, rather they introduce a complementary interpretation of a DP algorithm's ability to defend against membership inference specifically. Because our bound more tightly captures the effect of how training data was selected, we can show that decreasing the sampling rate when constructing the training dataset has a disparate effect on the bound when compared to strengthening the DP guarantee. Thus, when the privacy protection we care about is defending against membership inference, training on less data can yield more advantageous trade-offs between preventing membership inference and utility than strengthening the DP guarantee. We empirically illustrate this on MNIST, CIFAR10 and SVHN-extended.
Submission Length: Long submission (more than 12 pages of main content)
Supplementary Material: zip
Assigned Action Editor: ~Yu-Xiang_Wang1
Submission Number: 1541
Loading