Position: Membership Inference Attack Should Move On to Distributional Statistics for Distilled Generative Models

Published: 01 Jul 2025, Last Modified: 09 Jul 2025ICML 2025 R2-FM Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: membership inference attack, model disitilation, generative model
Abstract: To detect unauthorized training data usage in training large-scale generative models, membership inference attacks (MIAs) have proven effective in distinguishing a single training instance (i.e., a member) from a single non-training instance (i.e., a non-member). This success relies on a memorization effect: Since models overfit training data, they tend to perform better on a member than a non-member. However, we find that standard MIAs fail against distilled generative models (i.e., student models) that are usually deployed for efficiency. This is because student models, trained exclusively on data generated by large-scale generative models (i.e., teacher models), lack direct exposure to the teacher’s original training data, thereby nullifying the memorization effect. This finding reveals a serious privacy loophole, where generation service providers could deploy a student model whose teacher was trained on unauthorized data, yet claim the deployed model is “clean” because it was not directly trained on such data. To fix this loophole, we uncover a memory chain that persists: the student’s output distribution aligns more with the teacher’s members than non-members, making unauthorized data use detectable. This leads us to posit that MIAs on distilled generative models should shift from instance-level scores to distribution-level statistics. We further propose three principles of distribution-based MIAs for detecting unauthorized training data through distilled generative models, and validate our position through an exemplar framework. We lastly discuss the implications our position leads to.
Submission Number: 101
Loading