MMDU-Bench: Multi-modal Deep Unlearning Benchmark

Published: 14 Jun 2025, Last Modified: 16 Aug 2025MKLM 2025EveryoneRevisionsBibTeXCC BY 4.0
Submission Type: Non-archive
Keywords: machine unlearning, multi-modal knowledge graph, large vision-language model
TL;DR: We present MMDU-Bench, the first benchmark for multi-modal deep unlearning, where models must forget both explicit facts and cross-modal inferences. Results show current methods struggle to fully forget entangled knowledge across text and images.
Abstract: Large Vision-Language Models (LVLMs) trained on web-scale data risk memorizing private, harmful, or outdated information, making machine unlearning increasingly important. Prior work mainly targets unimodal settings and isolated fact removal, overlooking the reality that knowledge is often deeply interconnected across modalities like text and images. We introduce **MMDU-Bench**, the first benchmark for **multi-modal deep unlearning**, where models must forget both explicit facts and implicit inferences made through cross-modal reasoning. Built on a large-scale synthetic knowledge graph with over 30k relations and 166k QA pairs, MMDU-Bench enables fine-grained evaluation of forgetting and retention. Experiments across five representative methods show that the majority achieve 30% Deep Forget Quality, revealing difficulty in removing entangled knowledge. We also observe large performance gaps between text-only and multi-modal unlearning, as well as a trade-off where stronger forgetting often leads to loss of related knowledge. MMDU-Bench highlights these overlooked challenges and provides a foundation for developing more effective and reliable unlearning methods.
Submission Number: 16
Loading