POPS: Recovering Unlearned Multi-Modality Knowledge in MLLMs with Fine-tuning and Prompt-based Attacks
Keywords: Multimodal Unlearning and Attack, MLLM
Abstract: Multimodal Large Language Models (MLLMs) have demonstrated impressive performance on cross-modal tasks by jointly training on large-scale textual and visual data, where privacy-sensitive examples could be intentionally or unintentionally encoded, raising concerns about privacy or copyright violation. To this end, Multi-modality Machine Unlearning (MMU) was proposed as a mitigation that can effectively force MLLMs to forget private information. Yet, the robustness of such unlearning is not fully exploited when the model is published and accessible to malicious users. In this paper, we propose a novel adversarial strategy, namely Prompt-Optimized Parameter Shaking (POPS), aiming to retrieve the unlearned multi-modality knowledge via fine-tuning. Our method steers victim MLLMs to generate potential private examples by prompt optimization and then uses the synthesized examples to fine-tune the MLLMs to generate private information. Our experiments on the different MMU benchmarks reveal substantial weaknesses in the existing MMU algorithms. Our attacks achieve near-complete recovery of supposedly erased sensitive information, exposing fundamental vulnerabilities that challenge the very foundations of current multimodal privacy protection.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 15091
Loading