Leveraging VLMs for MUDA:A Category-Specific Prompting with Multi-Modal Low-Rank Adapter

27 Sept 2024 (modified: 17 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multi-source Unsupervised Domain Adaptation, Low-Rank Adaptation, Pre-trained Vision-Language Model, Multi-Modal Alignment
TL;DR: We propose a method that combines learnable prompts for shared knowledge with a Multimodal Low-Rank Adaptation (LoRA) technique to acquire domain-specific insights, effectively adapting to the target domain.
Abstract: Multi-Source Domain Adaptation (MSDA) aims to adaptively apply knowledge from multiple source pre-trained models to an unlabeled target domain. Current MSDA methods typically require extensive parameter tuning for each source model, which becomes computationally expensive, especially when dealing with numerous source domains or larger source models. With the recent advancements of Vision-Language Models (VLMs) as natural source models, the challenges of cross-domain tasks based on multi-source domains have evolved: 1) VLMs rapidly adapt to downstream tasks through prompt tuning, yet learnable prompt tokens are prone to overfftting due to limited training samples; 2) Rapidly leveraging knowledge from multiple source domains and encouraging the learning of invariant representations across these domains is a central issue; 3) The presence of visual and textual domain gaps, as well as cross-modal misalignment, can signiffcantly impact model performance. In this paper, we propose a ffnetuning framework that integrates prompts with multimodal Low-Rank Adaptation (LoRA). This framework employs learnable prompt features as shared characteristics across different domains and utilizes multimodal LoRA matrices to represent domain-speciffc features for individual ffne-tuning of VLMs across multiple source domains. Furthermore, it encourages interaction between ffne-tuning parameters from different domains and modalities to enhance consistency. We combine all source domain-speciffc LoRA modules into an integrated module using a set of coefffcients and adapt this integrated module to learn on the target domain. Extensive experiments demonstrate that our approach achieves signiffcant improvements on standard image classiffcation benchmark datasets, highlighting its effectiveness in multi-source domain adaptation tasks.
Primary Area: transfer learning, meta learning, and lifelong learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 10714
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview