Automating Evaluation of Diffusion Model Unlearning with (Vision-) Language Model World Knowledge

Eric Yeats; Darryl Hannan; Henry Kvinge; Timothy Doster; Scott Mahan

Automating Evaluation of Diffusion Model Unlearning with (Vision-) Language Model World Knowledge

Eric Yeats, Darryl Hannan, Henry Kvinge, Timothy Doster, Scott Mahan

Published: 11 Jun 2025, Last Modified: 01 Jul 2025MUGen @ ICML 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Machine Unlearning, Generative AI, Concept Erasure, Robustness, Evaluation

Abstract: Machine unlearning (MU) is a promising cost-effective method to cleanse undesired information (concepts, biases, or patterns) from foundational diffusion models. While MU is orders of magnitude less costly than retraining a diffusion model without the undesired information, it can be challenging and labor-intensive to prove that the information has been fully removed from the model. Moreover, MU can damage diffusion model performance on surrounding concepts that the user would like to retain, making it unclear if the diffusion model is still fit for deployment. We introduce an automated MU evaluation tool which leverages (vision-) language models (LM) to robustly evaluate "unlearned" diffusion models for user-specified unlearning scenarios using red-teaming strategies. Given a target concept, the tool extracts structured, relevant world knowledge from the LM which is then used to thoroughly quantify the effectiveness of unlearning and the damage incurred to nearby concepts. We use our automated tool to evaluate popular diffusion model unlearning methods, revealing cases where typical handwritten evaluations lead to inaccurate assessments of unlearning performance.

Submission Number: 21

Loading