Keywords: generative models, diffusion models, memorization, privacy, influence functions, Hessian sharpness, data-centric AI, machine unlearning
TL;DR: Layer-wise influence tracing assigns each training image a Hessian-based memorization risk score; removing the top 1% + a single low-LR fine-tune cuts memorization by ~70% on SD-XL with <1% FID degradation at 2.3 GPUh.
Abstract: Text-to-image diffusion models can inadvertently memorize and
regenerate unique training images, posing serious privacy and
copyright risks. While recent work links such memorization to sharp
spikes in the model’s log-density Hessian, existing diagnostics stop at
flagging \emph{that} a model overfits, not \emph{which} samples are to
blame or how to remove them. We introduce \emph{layer-wise influence
tracing}, a scalable Hessian decomposition that assigns every training
image a curvature-based influence score. Deleting only the top
$1\%$ high-risk images and performing a single, low-learning-rate
fine-tune cuts verbatim reconstructions in Stable Diffusion XL by
$72\%$ while keeping Fréchet Inception Distance within $1\%$ of the
baseline. The full procedure costs just 2.3 GPU-hours—over an order of
magnitude cheaper than full-Hessian methods—and yields similar gains on
a 1-billion-parameter distilled backbone. Our results turn a coarse
memorization signal into an actionable, data-centric mitigation
strategy, paving the way toward privacy-respecting generative models at
10 B+ scale.
Submission Number: 54
Loading