OpenFedLLM: Training Large Language Models on Decentralized Private Data via Federated Learning

Rui Ye; WenHao Wang; Jingyi Chai; Dihan Li; Zexi Li; Yinda Xu; Yaxin Du; Yanfeng Wang; Siheng Chen

OpenFedLLM: Training Large Language Models on Decentralized Private Data via Federated Learning

Rui Ye, WenHao Wang, Jingyi Chai, Dihan Li, Zexi Li, Yinda Xu, Yaxin Du, Yanfeng Wang, Siheng Chen

Published: 04 Mar 2024, Last Modified: 02 May 2024DPFM 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Federated Learning, Instruction Tuning, Value Alignment

TL;DR: We introduce an integrated and concise framework for training LLMs via federated learning, provide a comprehensive empirical study, and point out future directions.

Abstract: Trained on massive publicly available data, large language models (LLMs) have demonstrated tremendous success across various fields. While more data contributes to better performance, a disconcerting reality is that high-quality public data will be exhausted in a few years. In this paper, we offer a potential next step for contemporary LLMs: collaborative and privacy-preserving LLM training on the underutilized distributed private data via federated learning (FL), where multiple data owners collaboratively train a shared model without transmitting raw data. To achieve this, we build a concise, integrated, and research-friendly framework/codebase, named OpenFedLLM. It covers federated instruction tuning for enhancing instruction-following capability, federated value alignment for aligning with human values, and 7 representative FL algorithms. Besides, OpenFedLLM supports training on diverse domains, where we cover 8 training datasets; and provides comprehensive evaluations, where we cover 30+ evaluation metrics. Through extensive experiments, we observe that all FL algorithms outperform local training on training LLMs, demonstrating a clear performance improvement across a variety of settings. Notably, in a financial benchmark, Llama2-7B fine-tuned by applying any FL algorithm can outperform GPT-4 by a significant margin while the model obtained through individual training cannot, demonstrating strong motivation for clients to participate in FL. Code is available at https://github.com/rui-ye/OpenFedLLM.

Submission Number: 62

Loading