A Unified Framework for Diffusion Model Unlearning with f-Divergence

Nicola Novello; Federico Fontana; Luigi Cinque; Deniz Gunduz; Andrea M Tonello

A Unified Framework for Diffusion Model Unlearning with f-Divergence

Nicola Novello, Federico Fontana, Luigi Cinque, Deniz Gunduz, Andrea M Tonello

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: machine unlearning, diffusion models, f-divergence

Abstract: Machine unlearning aims to remove specific knowledge from a trained model. While diffusion models (DMs) have shown remarkable generative capabilities, existing unlearning methods for text-to-image (T2I) models often rely on minimizing the mean squared error (MSE) between the output distribution of a target and an anchor concept. We show that this MSE-based approach is a special case of a unified $f$-divergence-based framework, in which any $f$-divergence can be utilized. We analyze the benefits of using different $f$-divergences, that mainly impact the convergence properties of the algorithm and the quality of unlearning. The proposed unified framework offers a flexible paradigm that allows to select the optimal divergence for a specific application, balancing different trade-offs between aggressive unlearning and concept preservation.

Supplementary Material: zip

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 24471

Loading