TL;DR: A Black-Box Membership Inference Attack Algorithm for Diffusion Models
Abstract: Given the rising popularity of AI-generated art and the associated copyright concerns, identifying whether an artwork was used to train a diffusion model is an important research topic. The work approaches this problem from the membership inference attack (MIA) perspective. We first identify the limitation of applying existing MIA methods for proprietary diffusion models: the required access of internal U-nets.
To address the above problem, we introduce a novel membership inference attack method that uses only the image-to-image variation API and operates without access to the model's internal U-net. Our method is based on the intuition that the model can more easily obtain an unbiased noise prediction estimate for images from the training set. By applying the API multiple times to the target image, averaging the outputs, and comparing the result to the original image, our approach can classify whether a sample was part of the training set. We validate our method using DDIM and Stable Diffusion setups and further extend both our approach and existing algorithms to the Diffusion Transformer architecture. Our experimental results consistently outperform previous methods.
Lay Summary: AI image generators often train on artists’ works without consent, raising copyright concerns. We introduce a way to detect if a specific image was used in training, even when the model is a black box. Our method repeatedly applies the model’s image-editing tool to the same image and averages the outputs. If the model has seen the image before, the outputs tend to look more consistent. This pattern helps us identify training images without needing access to the model’s internal code, offering a new way to protect creators’ rights.
Primary Area: Social Aspects->Safety
Keywords: diffusion model, membership inference attack
Submission Number: 8310
Loading