Federated Adapter on Foundation Models:  An Out-Of-Distribution Approach

yiyuan yang; Guodong Long; Tianyi Zhou; Qinghua Lu; Shanshan Ye; Jing Jiang

Federated Adapter on Foundation Models: An Out-Of-Distribution Approach

yiyuan yang, Guodong Long, Tianyi Zhou, Qinghua Lu, Shanshan Ye, Jing Jiang

16 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Federated Learning, Foundation Models

Abstract: As foundation models gain increasing attention from both academic and industrial communities, Federated Foundation Models (FedFM) have emerged as a privacy-preserving approach for collaboratively fine-tuning models in federated learning (FL) frameworks using distributed datasets across multiple clients. A key challenge for FedFM, given the versatile nature of foundation models, is addressing out-of-distribution (OOD) generalization, where unseen tasks or clients may exhibit distribution shifts leading to suboptimal performance. Although numerous studies have explored OOD generalization in conventional FL, these methods are inadequate for FedFM due to the challenges posed by large parameter scales and increased data heterogeneity, where large parameter scales would result in high computational and communication costs while increased data heterogeneity like cross-domain would lead to suboptimal performance of the aggregated global model on individual client distributions. To bridge this gap, we propose a new method, called FedOA, to enhance the OOD generalization of FedFM under these conditions. Specifically, our method employs adapter-based parameter-efficient fine-tuning methods for efficient learning, and introduces an additional personalized model with a feature distance-based regularization to ensure distribution alignment and provide OOD generalization guarantees for each client. Theoretically, we demonstrate that the conventional aggregated global model in FedFM inherently retains OOD generalization capabilities, and our proposed method enhances the personalized model's OOD generalization through regularization informed by the global model, with proven convergence under general non-convex settings. Empirically, the effectiveness of the proposed method is validated on benchmark datasets across various NLP tasks.

Primary Area: other topics in machine learning (i.e., none of the above)

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 1050

Loading