Enhancing Foundation Models with Federated Domain Knowledge Infusion

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Vision foundation models (FMs) like CLIP have exhibited exceptional capabilities in visual and linguistic understanding, particularly in zero-shot inference tasks. However, these models struggle with data that significantly deviates from their training samples, necessitating fine-tuning, which is often infeasible in centralized settings due to data privacy concerns. Federated learning (FL) combined with parameter-efficient fine-tuning (PEFT) offers a potential solution, yet existing methods face issues with domain-specific characteristics and out-of-domain generalization. We propose a cross-silo Federated Adapter Generalization (FedAG), a novel federated fine-tuning approach that leverages multiple fine-grained adapters to capture domain-specific knowledge while enhancing out-of-domain generalization. Our method uses quality-aware in-domain mutual learning and attention-regularized cross-domain learning to integrate domain-specific insights effectively. Experiments of the CLIP model on three domain-shifting datasets, ImageCLEF-DA, Office-Home, and DomainNet, demonstrate the superior performance of FedAG in both in-domain and out-of-domain scenarios. We envision this work as a milestone for generalizing CLIP to handle the challenge of out-of-domain knowledge under federated learning setting.
Lay Summary: (1) AI models like CLIP are great at understanding images and text but struggle when faced with unfamiliar data, especially in sensitive areas like healthcare or finance where data can’t be shared. (2) To solve this, we created a method called FedAG that allows multiple institutions to collaboratively fine-tune such models without sharing private data. It uses small, smart components called adapters to learn from each institution’s unique data, while also learning how to generalize across different data sources. (3) This approach helps the model perform well not just within known settings, but also in new, unseen scenarios—paving the way for more trustworthy and versatile AI that respects data privacy.
Primary Area: Applications->Computer Vision
Keywords: Federated Learning, Foundation Model
Submission Number: 4046
Loading