Meta-Learning to Teach Semantic Prompts for Open Domain Generalization in Vision-Language Models

Shirsha Bose; Mainak Singha; Ankit Jha; Souradeep Mukhopadhyay; Biplab Banerjee

Meta-Learning to Teach Semantic Prompts for Open Domain Generalization in Vision-Language Models

Shirsha Bose, Mainak Singha, Ankit Jha, Souradeep Mukhopadhyay, Biplab Banerjee

Published: 04 Apr 2025, Last Modified: 04 Apr 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Open Domain Generalization (ODG) addresses the challenges posed by domain and category shifts between labeled training sources and unlabeled target domains. Current state-of-the-art methods struggle with the limitations of traditional CNN backbones, leading to reduced generalization and increased error rates in detecting target open samples without prior knowledge. Additionally, recent CLIP-based prompt learning approaches fail to distinguish between known and unknown classes effectively, resulting in suboptimal performance. To address these challenges, we propose MetaPrompt, which leverages the semantic strengths of the vision-language model CLIP and the ''learning-to-learn'' capabilities of Meta-Learning to achieve robust generalization across domain and category shifts. Our framework introduces three key innovations: First, we approach ODG as a multi-class classification problem that includes both known and novel categories, designing novel prompts capable of detecting unknown class samples across multiple domains. These prompts are trained using Meta-Learning with momentum updates, enabling smooth and accurate differentiation between known and unknown classes. Second, we introduce a novel domain-agnostic semantic attention-based prompt alongside domain-focused prompts to enhance robustness in classifying unknown classes across various domains. Finally, we incorporate an unsupervised contrastive loss during episodic Meta-Training, which reinforces the boundaries in the metric space between known and unknown classes, thereby enhancing ''unknown'' class awareness in the prompts. MetaPrompt has demonstrated its superiority through extensive testing on diverse datasets, excelling in both closed and open-set DG scenarios and consistently outperforming existing solutions.

Submission Length: Long submission (more than 12 pages of main content)

Changes Since Last Submission: We changed the paper according to the reviewers comments. All changes are colored blue in the manuscript. 1) We changed the Table 1 by addition of new rows into to provide new experimental data. The changes are in blue in the paper. &\textcolor{blue}{Meta-MaPLe} & - &\textcolor{blue}{92.35} &\textcolor{blue}{84.44} &\textcolor{blue}{89.42} &\textcolor{blue}{73.56} &\textcolor{blue}{86.33} &\textcolor{blue}{78.61} &\textcolor{blue}{91.23} &\textcolor{blue}{71.69} &\textcolor{blue}{81.79} &\textcolor{blue}{78.75} &\textcolor{blue}{79.48} &\textcolor{blue}{69.81} &\textcolor{blue}{86.77} &\textcolor{blue}{76.14}\\ &\textcolor{blue}{Meta-STYLIP} & - &\textcolor{blue}{95.67} &\textcolor{blue}{88.39} &\textcolor{blue}{93.25} &\textcolor{blue}{78.87} &\textcolor{blue}{91.57} &\textcolor{blue}{87.35} &\textcolor{blue}{92.18} &\textcolor{blue}{72.73} &\textcolor{blue}{84.23} &\textcolor{blue}{83.44} &\textcolor{blue}{86.77} &\textcolor{blue}{78.08} &\textcolor{blue}{90.61} &\textcolor{blue}{81.47}\\ &\textcolor{blue}{Meta-ODG-CLIP} & - &\textcolor{blue}{99.78} &\textcolor{blue}{99.95} &\textcolor{blue}{96.52} &\textcolor{blue}{88.44} &\textcolor{blue}{98.66} &\textcolor{blue}{98.29} &\textcolor{blue}{92.94} &\textcolor{blue}{80.16} &\textcolor{blue}{86.51} &\textcolor{blue}{92.33} &\textcolor{blue}{96.43} &\textcolor{blue}{95.58} &\textcolor{blue}{95.14} &\textcolor{blue}{92.45}\\ \cmidrule(lr){2-17} &\textcolor{blue}{METAPROMPT + Stable Diffusion} & - &\textcolor{blue}{99.13} &\textcolor{blue}{99.95} &\textcolor{blue}{96.53} &\textcolor{blue}{92.85} &\textcolor{blue}{97.58} &\textcolor{blue}{98.89} &\textcolor{blue}{95.27} &\textcolor{blue}{83.69} &\textcolor{blue}{88.65} &\textcolor{blue}{94.14} &\textcolor{blue}{96.19} &\textcolor{blue}{98.79} &\textcolor{blue}{95.56} &\textcolor{blue}{94.72}\\ 2) In the Section "Comparison to Literature" we add some writing in blue to the point "Open-set DG:" as: \textcolor{blue}{When, we combine our method \textsc{MetaPrompt} with the synthetic data generated by the Stable Diffusion process as used in ODG-CLIP, we see an increase in the overall accuracy of the closed-set and the open-set setups. This is due to the diverse generation of open and closed set samples that further helps in meta-learning of the prompts to understand the openness.} 3) In Section "Ablation Studies" under submit "Comparison with prompt-learning based methods under meta-training setup:" we add the following lines: \textcolor{blue}{In \ref{tab_open}, we show a complete quantitative analysis of the efficacy of our meta-training setup over the methods MaPLe, \textsc{StyLIP}, and ODG-CLIP. We see that when ODG-CLIP is trained in our meta-training setup, the overall closed and open set accuracy increases compared to the performance of ODG-CLIP without any meta-training. This leads to the conclusion that our meta-training setup is useful to make the learnable prompts distinguish between the close and open samples.} 4) In Section "Ablation Studies", we add the following subsection and the table for parameters comparison: \textcolor{blue}{\noindent\textbf{Computational Complexity:} We provide insights over the computational complexity of our proposed \textsc{MetaPrompt} method. We have demonstrated the efficacy of \textsc{MetaPrompt} in comparison to state-of-the-art ODG methods. Additionally, we provide an analysis of the computational overhead inherent in our \textsc{MetaPrompt} approach for the OfficeHome dataset in the Table \ref{tab:flops} below. This shows \textsc{MetaPrompt} have the least number of parameters with the lowest GFLOPS compared to the current state-of-art methods. Thus reducing the overall training time and testing time.} \begin{table}[h!] \centering \begin{tabular}{lcccc} \toprule \textcolor{blue}{\textbf{Method}} & \textcolor{blue}{\textbf{Parameters (\#M)}} & \textcolor{blue}{\textbf{GFLOPs}} & \textcolor{blue}{\textbf{Training time (min)}} & \textcolor{blue}{\textbf{Testing time (min)}} \\ \midrule \textcolor{blue}{ODG-CLIP} & \textcolor{blue}{142} & \textcolor{blue}{253.6} & \textcolor{blue}{65} & \textcolor{blue}{1.3} \\ \textcolor{blue}{Meta-ODG-CLIP} & \textcolor{blue}{156} & \textcolor{blue}{255.4} & \textcolor{blue}{70} & \textcolor{blue}{1.3} \\ \textcolor{blue}{Ours} & \textcolor{blue}{12} & \textcolor{blue}{47.5} & \textcolor{blue}{27} & \textcolor{blue}{0.6} \\ \bottomrule \end{tabular} \caption{\textcolor{blue}{Comparison of model methods in terms of parameters, GFLOPs, training time, and testing time.}} \label{tab:flops} \end{table}

Assigned Action Editor: ~Jia-Bin_Huang1

Submission Number: 3232

Loading