Is EMA Robust? Examining the Robustness of Data Auditing and a Novel Non-calibration Extension

Published: 27 Oct 2023, Last Modified: 06 Nov 2024RegML 2023EveryoneRevisionsBibTeX
Keywords: Machine unlearning, Data auditing
Abstract: Auditing data usage in machine learning models is crucial for regulatory compliance, especially with sensitive data like medical records. In this study, we scrutinize potential vulnerabilities within an acknowledged baseline method, Ensembled Membership Auditing (EMA), which employs membership inference attacks to determine if a specific model was trained using a particular dataset. We discover a novel False Negative Error Pattern in EMA when applied to large datasets, under adversarial methods like dropout, model pruning, and MemGuard. Our analysis across three datasets shows that larger convolutional models pose a greater challenge for EMA, but a novel metric-set analysis improves performance by up to $5\%$. To extend the applicability of our improvements, we introduce EMA-Zero, a GAN-based dataset auditing method that does not require an external calibration dataset. Notably, EMA-Zero performs comparably to EMA with synthetic calibration data trained on as few as 100 samples.
Submission Number: 9
Loading