Abstract: Dataset auditing for machine learning (ML)
models is a method to evaluate if a given dataset is used
in training a model. In a Federated Learning setting where
multiple institutions collaboratively train a model with their
decentralized private datasets, dataset auditing can facilitate
the enforcement of regulations, which provide rules for
preserving privacy, but also allow users to revoke authorizations
and remove their data from collaboratively trained
models. This paper first proposes a set of requirements for a
practical dataset auditing method, and then present a novel
dataset auditing method called Ensembled Membership Auditing
(EMA). Its key idea is to leverage previously proposed
Membership Inference Attack methods and to aggregate
data-wise membership scores using statistic testing to audit
a dataset for a ML model. We have experimentally evaluated
the proposed approach with benchmark datasets,
as well as 4 X-ray datasets (CBIS-DDSM, COVIDx, Child-
XRay, and CXR-NIH) and 3 dermatology datasets (DERM7pt,
HAM10000, and PAD-UFES-20). Our results show that EMA
meet the requirements substantially better than the previous
state-of-the-art method.
0 Replies
Loading