Basil: A Fast and Byzantine-Resilient Approach for Decentralized TrainingDownload PDF

15 Sept 2021, 00:52PRIML 2021 PosterReaders: Everyone
Keywords: Distributed machine learning, byzantine robustness
TL;DR: We propose Basil, a fast and computationally efficient Byzantine robust algorithm for decentralized (serverless) training systems.
Abstract: Decentralized (i.e., serverless) learning across a large number of distributed nodes (e.g., mobile users) has seen a surge of recent interests. The key advantage of these setups is that they provide privacy for the local data of the users while not requiring a server for coordinating the training. They can, however, suffer substantially from potential Byzantine nodes in the network who can degrade the training performance. Detection and mitigation of Byzantine behaviors in a decentralized learning setting is a daunting task, especially when the data distribution at the users is heterogeneous. As our main contribution, we propose \texttt{Basil}, a fast and computationally efficient Byzantine robust algorithm for decentralized training systems, which leverages a novel sequential, memory assisted and performance based criteria for training over a logical ring while filtering the Byzantine users. In the IID dataset distribution setting, we provide the theoretical convergence guarantees of \texttt{Basil}, demonstrating its linear convergence rate. Furthermore, for the IID setting, we experimentally demonstrate that \texttt{Basil} is robust to various Byzantine attacks, including the strong Hidden attack, while providing up to ${\sim}16 \%$ higher test accuracy over the state-of-the-art Byzantine-resilient decentralized learning approach. Additionally, we generalize \texttt{Basil} to the non-IID dataset distribution setting by proposing Anonymous Cyclic Data Sharing (ACDS), a technique that allows each node to anonymously share a random fraction of its local non-sensitive dataset (e.g., landmarks images) with all other nodes. We demonstrate that \texttt{Basil} alongside ACDS with only $5\%$ data sharing provides effective toleration of Byzantine nodes, unlike the state-of-the-art Byzantine robust algorithm that completely fails in the heterogeneous data setting.
Paper Under Submission: The paper is NOT under submission at NeurIPS
1 Reply