Is My Data Safe? Predicting Instance-Level Membership Inference Success for White-box and Black-box Attacks

Published: 28 Jun 2024, Last Modified: 25 Jul 2024NextGenAISafety 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Privacy, Membership Inference
TL;DR: We investigate a recently proposed MI attack with optimality guarantees and practice and look for efficient strategies to identify the most vulnerable instances.
Abstract: We perform an extensive empirical investigation of three recent membership inference (MI) attacks on vision and language models. Our investigation includes the newly proposed Gradient Likelihood Ratio (GLiR) attack, a white-box attack with theoretical optimality guarantees. Prior research has suggested that white-box attacks cannot outperform black-box MI attacks. In this work, we challenge this hypothesis by running and evaluating this attack on real-world models with up to 53M parameters for the first time. We find that this white-box attack does indeed have the potential to outperform other attacks. We subsequently focus on the problem of MI susceptibility prediction, which is concerned with efficiently identifying individuals who are most susceptible to attack risk à priori. By doing so, we uncover which characteristics make instances susceptible to MI and whether the targeted instances are the same across attacks with different access (e.g., white-box or black-box) to the target model. We implement and study over 20 predictors of attack success. We find that GLiR mostly targets the same points as loss-based attacks and that the vulnerable instances can be efficiently predicted.
Submission Number: 60
Loading