DAT: Dual-Branch Adapter-Tuning for Few-Shot Recognition

Published: 01 Jan 2025, Last Modified: 27 Jul 2025IEEE Trans. Circuits Syst. Video Technol. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Parameter-Efficient Fine-Tuning methods based on vision-language models (such as CLIP) for few-shot learning have recently received considerable attention. However, previous works only fine-tune either the image or text branch, breaking the alignment of the original two branches, meanwhile fine-tuning both branches of the CLIP would inevitably introduce more trainable parameters and likely cause more severe over-fitting due to the limited training data. In this study, we propose a novel Dual-branch Adapter-Tuning framework (DAT), which collaboratively trains the visual adapter and textual adapter added to the two branches of the original CLIP with multiple consistency constraints. By effectively utilizing the semantically detailed class-specific prompts and outputs of the original CLIP to guide the fine-tuning of both branches, our method gains exceptional adaptation ability to the downstream few-shot learning tasks and alleviates the over-fitting issue, meanwhile maximally preserving the generalization ability of the original CLIP model. Our proposed framework has achieved superior performance on diverse datasets under various few-shot learning settings compared to the existing approaches. The source code is available at https://github.com/SandyXi/DAT.
Loading