FSL-SAGE: Accelerating Federated Split Learning via Smashed Activation Gradient Estimation

Srijith Nair; Michael Lin; Peizhong Ju; Amirreza Talebi; Elizabeth Serena Bentley; Jia Liu

FSL-SAGE: Accelerating Federated Split Learning via Smashed Activation Gradient Estimation

Srijith Nair, Michael Lin, Peizhong Ju, Amirreza Talebi, Elizabeth Serena Bentley, Jia Liu

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We propose a new federated split learning algorithm with sublinear convergence rate and superior communication efficiency compared to state-of-the-art.

Abstract: Collaborative training methods like Federated Learning (FL) and Split Learning (SL) enable distributed machine learning without sharing raw data. However, FL assumes clients can train entire models, which is infeasible for large-scale models. In contrast, while SL alleviates the client memory constraint in FL by offloading most training to the server, it increases network latency due to its sequential nature. Other methods address the conundrum by using local loss functions for parallel client-side training to improve efficiency, but they lack server feedback and potentially suffer poor accuracy. We propose FSL-SAGE (Federated Split Learning via Smashed Activation Gradient Estimation), a new federated split learning algorithm that estimates server-side gradient feedback via auxiliary models. These auxiliary models periodically adapt to emulate server behavior on local datasets. We show that FSL-SAGE achieves a convergence rate of $\mathcal{O}(1/\sqrt{T})$, where $T$ is the number of communication rounds. This result matches FedAvg, while significantly reducing communication costs and client memory requirements. Our empirical results also verify that it outperforms existing state-of-the-art FSL methods, offering both communication efficiency and accuracy.

Lay Summary: With the impressive performance of recently developed AI models like ChatGPT, there is a clear need for multiple computers or devices to take part in collaboratively training big models. Although many small devices like smartphones or smaller laptop computers have a lot of useful data to train large AI models, they are unable to participate in training because they have much lower resources like GPU, memory, etc. We design an algorithm that can allow such low-resource devices to take part in training large models without breaching the privacy of their data, with whatever resources they can provide for doing so. Our method and its future extensions can greatly shape the way we train AI models today, by enabling collaboration between many small-resource players with useful data, like academic institutions, hospitals, small firms, etc., while preserving privacy standards.

Link To Code: https://github.com/srijith1996/FSL-SAGE

Primary Area: Optimization->Large Scale, Parallel and Distributed

Keywords: federated split learning, communication efficient split learning, client-constrained federated learning

Submission Number: 12731

Loading