Towards Non-destructive Privacy Protection for LVLMs via node-level localized editing

16 Sept 2025 (modified: 12 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Vision-Language Model, Model Editing, Privacy Protection
TL;DR: A privacy risk mitigation algorithm based on localized feature model editing.
Abstract: Large Vision-Language Models (LVLMs) have shown astonishing potential in various vision tasks and are broadly used in sectors like finance and medicine. However, the risk of abuse exists, where attackers may leverage these models to steal private information, creating security vulnerabilities for their deployment. Studies show that LVLMs struggle to consistently refuse privacy-compromising instructions from users. Current privacy protection research primarily focuses on safeguarding training data, aiming to prevent models from leaking sensitive information contained within it. However, privacy leakage can extend beyond training data, where models may be misused to extract private information from images or infer sensitive location details. The protection of such external privacy has received little attention. To address this, we introduce PRN-Edit, a privacy risk mitigation method based on model editing. Our method improves a model's privacy protection by increasing its rate of refusal to answer privacy-related questions, and it can generalize to novel sensitive questions not seen during the mitigation process. PRN-Edit works by using a learnable feature mask to locate privacy risk nodes in the feature encoding of user instructions, which then precisely guides the update of model parameters. Through comprehensive experiments on MiniGPT-4 and LLava-1.5, we show that our algorithm significantly boosts the model's privacy protection while maintaining its utility.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 6922
Loading