Keywords: decentralized federated learning, random walks, adaptive optimization, communication efficiency
TL;DR: We introduce, theoretically analyse and empirically evaluate a decentralized version of Adam that uses random walks and shows performance on par to centralized FedAvg.
Abstract: We tackle the problem of federated learning (FL) in a peer-to-peer fashion without a central server. While prior work mainly considered gossip-style protocols for learning, our solution is based on random walks. This allows to communicate only to a single peer at a time, thereby reducing the total communication and enabling asynchronous execution. To improve convergence and reduce the need for extensive tuning, we consider an adaptive optimization method -- Adam. Two extensions reduce its communication costs: state compression and multiple local updates on each client. We theoretically analyse the convergence behaviour of the proposed algorithm and its modifications in the non-convex setting. We show that our method can achieve performance comparable to centralized FL without communication overhead. Empirical results are reported on a variety of tasks (vision, text), neural network architectures and large-scale federations (up to $\sim342$k clients).
Is Student: No