Empowering Federated Learning With Implicit Gossiping: Mitigating Connection Unreliability Amidst Unknown and Arbitrary Dynamics

Ming Xiang; Stratis Ioannidis; Edmund Yeh; Carlee Joe-Wong; Lili Su

Empowering Federated Learning With Implicit Gossiping: Mitigating Connection Unreliability Amidst Unknown and Arbitrary Dynamics

Ming Xiang, Stratis Ioannidis, Edmund Yeh, Carlee Joe-Wong, Lili Su

Published: 01 Jan 2025, Last Modified: 14 May 2025IEEE Trans. Signal Process. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Federated learning is a popular distributed learning approach for training a machine learning model without disclosing raw data. It consists of a parameter server and a possibly large collection of clients (e.g., in cross-device federated learning) that may operate in congested and changing environments. In this paper, we study federated learning in the presence of stochastic and dynamic communication failures wherein the uplink between the parameter server and client $i$ is on with unknown probability $p_{i}^{t}$ in round $t$. Furthermore, we allow the dynamics of $p_{i}^{t}$ to be arbitrary. We first demonstrate that when the $p_{i}^{t}$'s vary across clients, the most widely adopted federated learning algorithm, Federated Average (FedAvg), experiences significant bias. To address this observation, we propose Federated Postponed Broadcast (FedPBC), a simple variant of FedAvg. It differs from FedAvg in that the parameter server postpones broadcasting the global model to the clients with active uplinks till the end of each training round. Despite uplink failures, we show that FedPBC converges to a stationary point of the original non-convex objective. On the technical front, postponing the global model broadcasts enables implicit gossiping among the clients with active links in round $t$. In spite of the time-varying nature of $p_{i}^{t}$, we can bound the perturbation of the global model dynamics using techniques to control gossip-type information mixing errors. Extensive experiments have been conducted on real-world datasets over diversified unreliable uplink patterns to corroborate our analysis.

Loading