Few-Shot Dual-Path Adaptation of Vision-Language Foundation Models

ICLR 2024 Workshop ME-FoMo Submission71 Authors

Published: 04 Mar 2024, Last Modified: 02 May 2024ME-FoMo 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Vision-Language Models, Few-Shot Learning, Domain Generalization, Transfer Learning
TL;DR: We introduce DualAdapter, a novel framework that incorporates positive and negative adapters across both vision and language modalities, ensuring efficient and effective adaptation.
Abstract: Leveraging vast datasets on the Internet, large-scale Vision-Language Models (VLMs) demonstrates great potential in learning open-world visual concepts, and exhibit remarkable performance across a wide range of downstream tasks through efficient fine-tuning. In this work, we propose a simple yet effective fine-tuning approach called DualAdapter, which for the first time investigates the inference capabilities of VLMs along both positive and negative directions. Unlike conventional approaches that solely rely on positive adapter-style fine-tuning, DualAdapter uniquely incorporate negative text descriptions and image samples, enabling fine-tuning from a dual perspective. During the few-shot adaptation process, our DualAdapter explicitly enhances correct alignments while simultaneously minimizing incorrect associations. Our rigorous evaluation across 15 datasets reveals that DualAdapter significantly surpasses existing state-of-the-art methods in terms of both adaptation efficiency and robustness to distribution shifts.
Submission Number: 71
Loading