POPS: Recovering Unlearned Multi-Modality Knowledge in MLLMs with Fine-tuning and Prompt-based Attacks

POPS: Recovering Unlearned Multi-Modality Knowledge in MLLMs with Fine-tuning and Prompt-based Attacks

ICLR 2026 Conference Submission15091 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multimodal Unlearning and Attack, MLLM

Abstract: Multimodal Large Language Models (MLLMs) have demonstrated impressive performance on cross-modal tasks by jointly training on large-scale textual and visual data, where privacy-sensitive examples could be intentionally or unintentionally encoded, raising concerns about privacy or copyright violation. To this end, Multi-modality Machine Unlearning (MMU) was proposed as a mitigation that can effectively force MLLMs to forget private information. Yet, the robustness of such unlearning is not fully exploited when the model is published and accessible to malicious users. In this paper, we propose a novel adversarial strategy, namely Prompt-Optimized Parameter Shaking (POPS), aiming to retrieve the unlearned multi-modality knowledge via fine-tuning. Our method steers victim MLLMs to generate potential private examples by prompt optimization and then uses the synthesized examples to fine-tune the MLLMs to generate private information. Our experiments on the different MMU benchmarks reveal substantial weaknesses in the existing MMU algorithms. Our attacks achieve near-complete recovery of supposedly erased sensitive information, exposing fundamental vulnerabilities that challenge the very foundations of current multimodal privacy protection.

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 15091

Loading