FedRepre: An Efficient and Scalable Federated Learning Framework with Client Representative Mechanism and Specialized Server Architecture

Published: 30 May 2024, Last Modified: 17 Jun 2024MLArchSys 2024 OralPosterEveryoneRevisionsBibTeXCC BY 4.0
Workshop Track: System for Machine Learning
Presentation: In-Person
Keywords: Federated Learning System; Client Selection; CXL Server
Presenter Full Name: Yitu Wang
Presenter Email: yitu.wang@duke.edu
Abstract: Federated learning (FL) is an emerging distributed machine learning (ML) technique that enables model training across heterogeneous devices while preserving data privacy. However, developing FL in real-world environments faces significant challenges that hinder performance and convergence efficiency. Specifically, the participating devices often have unbalanced local dataset distributions, uneven available computational capabilities, and fluctuating real-time network speeds. Moreover, scaling up the FL system to massive device populations magnifies the importance of the client selection strategy. The execution of such a strategy may emerge as a new bottleneck in the FL system. Unfortunately, prior work has yet to simultaneously address these pressing challenges surrounding real-world FL deployments. We propose FEDREPRE, an efficient and scalable FL framework to accelerate the read-world FL. FEDREPRE introduces a bi-level active client selection strategy called client representative mechanism to guarantee the fast convergence of the global model while reducing the client selection complexity. Specifically, the clients are first clustered based on the statistical correlations, and then cluster selection and representative selection are conducted respectively to attain the maximal global loss decrease and the minimal communication and training latency. To further enhance the scalability, FEDREPRE employs a specialized server architecture to reduce the computation time of the client selection algorithm on the server. We adopt compute express link (CXL) to develop an efficient memory system and unify the memory space with the memory resources on different devices. In addition, we offload the customized hardware selection kernel onto the FPGA with an optimized workflow. We empirically evaluate FEDREPRE across settings with varying scales and heterogeneity levels. The results show that FEDREPRE outperforms previous client selection strategies, achieving 2.16× – 19.54× speedup of convergence time and up to 1.63% accuracy improvement.
Presenter Bio: Yitu Wang is a Ph.D. candidate at Duke ECE. His research focuses on near memory/storage processing for data intensive applications and deep learning system.
Paper Checklist Guidelines: I certify that all co-authors have validated the presented results and conclusions, and have read and commit to adhering to the Paper Checklist Guidelines, Call for Papers and Publication Ethics.
Dataset Release: I certify that all co-authors commit to release the dataset and necessary scripts to reproduce the presented results.
Workshop Registration: Yes, at least one of the authors has registered for the workshop (Two-Day Registration at minimum).
Submission Number: 1
Loading