Abstract: We study the problem of identifying whether a target sample is included in the training procedure of neural networks (i.e. member vs. non-member). This problem is known as the problem of membership inference attacks, and raises concerns on the security and privacy of machine learning. In this work, we prove a separation law of membership privacy between one- and two-layer networks: the latter provably preserves less membership privacy against confidence-based attacks than the former. We also prove the phenomenon of confidence collapse in two-layer networks, which refers to the phenomenon that the samples of the same class have exactly the same confidence score. Our results are two-fold: a) gradient methods on two-layer ReLU networks converge to a confidence-collapsed solution, such that the attacker can classify members and non-members with perfect precision and recall; b) under the same assumptions as in a), there exists a training dataset such that the confidence collapse phenomenon does not occur and the attacker fails to classify all members and non- members correctly.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: N/A
Assigned Action Editor: ~Varun_Kanade1
Submission Number: 706
Loading