Parameter-Efficient Multi-Source Domain-Adaptive Prompt Tuning for Open-Vocabulary Object Detection

Parameter-Efficient Multi-Source Domain-Adaptive Prompt Tuning for Open-Vocabulary Object Detection

ICLR 2026 Conference Submission15038 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: domain shift, open-vocabulary, object detection

Abstract: Cross-domain open-vocabulary object detection (COVD) poses a unique and underexplored challenge, requiring models to generalize across both domain shifts and category shifts. To tackle this, we propose MAP: a parameter-efficient Multi-source domain-Adaptive Prompt tuning framework that leverages multiple labeled source domains to improve detection in novel, unlabeled target domains with unseen categories. MAP consists of two key components: Multi-Source Prompt Learning (MSPL) and Unsupervised Target Prompt Learning (UTPL). MSPL disentangles domain-invariant category semantics from domain-specific visual patterns by jointly learning shared and domain-aware prompts. UTPL enhances generalization in the unlabeled target domain by enforcing prediction consistency under text-guided style augmentations, introducing a novel entropy-minimization objective without relying on pseudo-labels. Together, these components enable effective alignment of visual and textual representations across both domains and categories. In addition, we present a theoretical analysis of the proposed prompts, examining their behavior through the lenses of fidelity and distinction. Extensive experiments on challenging COVD benchmarks demonstrate that MAP achieves state-of-the-art performance with significantly fewer additional parameters.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 15038

Loading