Approximate Domain Unlearning for Vision-Language Models

Kodai Kawamura; Yuta Goto; Rintaro Yanagi; Hirokatsu Kataoka; Go Irie

Approximate Domain Unlearning for Vision-Language Models

Kodai Kawamura, Yuta Goto, Rintaro Yanagi, Hirokatsu Kataoka, Go Irie

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 spotlightEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Approximate unlearning, Vision-Language Models

TL;DR: We propose Approximate Domain Unlearning (ADU), a new task for selectively removing domain-specific knowledge from pre-trained vision-language models, and introduce two key techniques, domain disentangling and instance-wise prompting, to solve it.

Abstract: Pre-trained Vision-Language Models (VLMs) exhibit strong generalization capabilities, enabling them to recognize a wide range of objects across diverse domains without additional training. However, they often retain irrelevant information beyond the requirements of specific target downstream tasks, raising concerns about computational efficiency and potential information leakage. This has motivated growing interest in approximate unlearning, which aims to selectively remove unnecessary knowledge while preserving overall model performance. Existing approaches to approximate unlearning have primarily focused on {\em class unlearning}, where a VLM is retrained to fail to recognize specified object classes while maintaining accuracy for others. However, merely forgetting object classes is often insufficient in practical applications. For instance, an autonomous driving system should accurately recognize {\em real} cars, while avoiding misrecognition of {\em illustrated} cars depicted in roadside advertisements as {\em real} cars, which could be hazardous. In this paper, we introduce {\em Approximate Domain Unlearning (ADU)}, a novel problem setting that requires reducing recognition accuracy for images from specified domains (e.g., {\em illustration}) while preserving accuracy for other domains (e.g., {\em real}). ADU presents new technical challenges: due to the strong domain generalization capability of pre-trained VLMs, domain distributions are highly entangled in the feature space, making naive approaches based on penalizing target domains ineffective. To tackle this limitation, we propose a novel approach that explicitly disentangles domain distributions and adaptively captures instance-specific domain information. Extensive experiments on four multi-domain benchmark datasets demonstrate that our approach significantly outperforms strong baselines built upon state-of-the-art VLM tuning techniques, paving the way for practical and fine-grained unlearning in VLMs. Code : https://kodaikawamura.github.io/Domain_Unlearning/.

Supplementary Material: zip

Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)

Submission Number: 274

Loading