Abstract: Prompt tuning adapts Vision-Language Models like CLIP to open-world tasks with minimal training costs. In this direction, one typical paradigm evaluates model performance **separately** on known classes (*i.e.*, base domain) and unseen classes (*i.e.*, new domain). However, real-world scenarios require models to handle inputs **without prior domain knowledge**. This practical challenge has spurred the development of **open-world prompt tuning**, which demands a unified evaluation of two stages: 1) detecting whether an input belongs to the base or new domain (**P1**), and 2) classifying the sample into its correct class (**P2**). What's more, as domain distributions are generally unknown, a proper metric should be insensitive to varying base/new sample ratios (**P3**). However, we find that current metrics, including HM, overall accuracy, and AUROC, fail to satisfy these three properties simultaneously. To bridge this gap, we propose $\mathsf{OpenworldAUC}$, a unified metric that jointly assesses detection and classification through pairwise instance comparisons. To optimize $\mathsf{OpenworldAUC}$ effectively, we introduce **Gated Mixture-of-Prompts (GMoP)**, which employs domain-specific prompts and a gating mechanism to dynamically balance detection and classification. Theoretical guarantees ensure generalization of GMoP under practical conditions. Experiments on 15 benchmarks in open-world scenarios show GMoP achieves SOTA performance on $\mathsf{OpenworldAUC}$ and other metrics.
Lay Summary: When adapting large vision-language models like CLIP to real-world applications, it's not enough to just do well on known categories — the model must also handle unfamiliar inputs without knowing whether they belong to familiar or new domains. This problem is called open-world prompt tuning.
Existing evaluation methods usually split the problem into two separate parts: detecting if the input is from a known or unknown class, and then classifying it. But in real use, these two steps are tightly connected, and traditional evaluation metrics fail to capture this.
To solve this, we introduce OpenworldAUC, a new evaluation metric that jointly considers detection and classification, without being sensitive to how many known or unknown examples appear. We also propose GMoP, a method that learns different prompts for different domains and uses a gating mechanism to decide how to use them. Our approach works reliably under realistic conditions and achieves strong performance across 15 benchmark datasets.
Link To Code: https://github.com/huacong/openworldauc
Primary Area: General Machine Learning
Keywords: Open-world Prompt Tuning, OOD detection
Submission Number: 5774
Loading