Leveraging Model Guidance to Extract Training Data from Personalized Diffusion Models

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: Extrapolated guidance from pretrained to fine-tuned DMs enables strong fine-tuning data extraction.
Abstract: Diffusion Models (DMs) have evolved into advanced image generation tools, especially for few-shot fine-tuning where a pretrained DM is fine-tuned on a small set of images to capture specific styles or objects. Many people upload these personalized checkpoints online, fostering communities such as Civitai and HuggingFace. However, model owners may overlook the potential risks of data leakage by releasing their fine-tuned checkpoints. Moreover, concerns regarding copyright violations arise when unauthorized data is used during fine-tuning. In this paper, we ask: "Can training data be extracted from these fine-tuned DMs shared online?" A successful extraction would present not only data leakage threats but also offer tangible evidence of copyright infringement. To answer this, we propose FineXtract, a framework for extracting fine-tuning data. Our method approximates fine-tuning as a gradual shift in the model's learned distribution---from the original pretrained DM toward the fine-tuning data. By extrapolating the models before and after fine-tuning, we guide the generation toward high-probability regions within the fine-tuned data distribution. We then apply a clustering algorithm to extract the most probable images from those generated using this extrapolated guidance. Experiments on DMs fine-tuned with datasets such as WikiArt, DreamBooth, and real-world checkpoints posted online validate the effectiveness of our method, extracting approximately 20% of fine-tuning data in most cases, significantly surpassing baseline performance. The code is available.
Lay Summary: Diffusion Models (DMs) are powerful tools for generating images. Many people use these models to create personalized art by fine-tuning them on a small set of images, such as pictures of specific objects or artistic styles. They often share these fine-tuned models online through platforms like Civitai or HuggingFace, but few realize this could unintentionally leak the images used for fine-tuning. This raises privacy concerns and risks of copyright violations if the original data wasn’t meant to be shared. In this paper, we explore whether it’s possible to extract the original training data from these publicly shared fine-tuned models. We introduce a method called FineXtract, which works by guiding the generation process using the differences between the fine-tuned model and the original model. This helps us recover images that likely resemble the original fine-tuning data. Our approach can recover around 20% of the original fine-tuning images in many cases, raising important questions about privacy and copyright in generative AI.
Link To Code: https://github.com/Nicholas0228/FineXtract
Primary Area: Social Aspects->Privacy
Keywords: Data Extraction, Copyright Protection, Privacy and Security, Diffusion Models, Trustworthy AI
Submission Number: 6670
Loading