Layer-wise Influence Tracing: Data-Centric Mitigation of Memorization in Diffusion Models

Published: 10 Jun 2025, Last Modified: 13 Jul 2025DIG-BUG ShortEveryoneRevisionsBibTeXCC BY 4.0
Keywords: generative models, diffusion models, memorization, privacy, influence functions, Hessian sharpness, data-centric AI, machine unlearning
TL;DR: Layer-wise influence tracing assigns each training image a Hessian-based memorization risk score; removing the top 1% + a single low-LR fine-tune cuts memorization by ~70% on SD-XL with <1% FID degradation at 2.3 GPUh.
Abstract: Text-to-image diffusion models can inadvertently memorize and regenerate unique training images, posing serious privacy and copyright risks. While recent work links such memorization to sharp spikes in the model’s log-density Hessian, existing diagnostics stop at flagging \emph{that} a model overfits, not \emph{which} samples are to blame or how to remove them. We introduce \emph{layer-wise influence tracing}, a scalable Hessian decomposition that assigns every training image a curvature-based influence score. Deleting only the top $1\%$ high-risk images and performing a single, low-learning-rate fine-tune cuts verbatim reconstructions in Stable Diffusion XL by $72\%$ while keeping Fréchet Inception Distance within $1\%$ of the baseline. The full procedure costs just 2.3 GPU-hours—over an order of magnitude cheaper than full-Hessian methods—and yields similar gains on a 1-billion-parameter distilled backbone. Our results turn a coarse memorization signal into an actionable, data-centric mitigation strategy, paving the way toward privacy-respecting generative models at 10 B+ scale.
Submission Number: 54
Loading