Learning to Prompt Your Domain for Federated Vision-Language Models

Guoyizhe Wei; Feng Wang; Anshul Shah; Rama Chellappa

Learning to Prompt Your Domain for Federated Vision-Language Models

Guoyizhe Wei, Feng Wang, Anshul Shah, Rama Chellappa

Published: 22 Nov 2025, Last Modified: 22 Nov 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: The prompt tuning paradigm, with its great advantages of low parameter count and stable training, has recently inspired numerous applications of CLIP-like vision-language models in federated learning. However, in this work, we posit that under significant domain gaps across federated participants, prompt-based CLIP may easily collapse to non-optimal solutions due to the neglect of domain-aware knowledge. We present a novel prompt tuning method, termed ADAPT, to address this issue by learning both intra- and inter-domain prompts. Specifically, we assign each federated participant a domain-specific prompt and use the image's visual features as a condition to guide the generation of language features, with the underlying idea that the prompted CLIP should detect the input image's domain correspondence before making the prediction of its category. Extensive experiments demonstrate ADAPT's significant efficiency and effectiveness in federated learning. For example, by learning and sharing only 2.1M parameters, ADAPT attains a 69.8% average accuracy over the six domains of DomainNet, which improves the original CLIP accuracy by 16.2%.

Submission Type: Regular submission (no more than 12 pages of main content)

Supplementary Material: pdf

Assigned Action Editor: ~Boqing_Gong1

Submission Number: 5130

Loading