Distributed Momentum for Byzantine-resilient Stochastic Gradient Descent

El Mahdi El Mhamdi; Rachid Guerraoui; Sébastien Rouault

Distributed Momentum for Byzantine-resilient Stochastic Gradient Descent

El Mahdi El Mhamdi, Rachid Guerraoui, Sébastien Rouault

Published: 12 Jan 2021, Last Modified: 22 Jun 2025ICLR 2021 PosterReaders: Everyone

Keywords: Byzantine SGD, Distributed ML, Momentum

Abstract: Byzantine-resilient Stochastic Gradient Descent (SGD) aims at shielding model training from Byzantine faults, be they ill-labeled training datapoints, exploited software/hardware vulnerabilities, or malicious worker nodes in a distributed setting. Two recent attacks have been challenging state-of-the-art defenses though, often successfully precluding the model from even fitting the training set. The main identified weakness in current defenses is their requirement of a sufficiently low variance-norm ratio for the stochastic gradients. We propose a practical method which, despite increasing the variance, reduces the variance-norm ratio, mitigating the identified weakness. We assess the effectiveness of our method over 736 different training configurations, comprising the 2 state-of-the-art attacks and 6 defenses. For confidence and reproducibility purposes, each configuration is run 5 times with specified seeds (1 to 5), totalling 3680 runs. In our experiments, when the attack is effective enough to decrease the highest observed top-1 cross-accuracy by at least 20% compared to the unattacked run, our technique systematically increases back the highest observed accuracy, and is able to recover at least 20% in more than 60% of the cases.

One-sentence Summary: An inexpensive method to substantially improve the effectiveness of existing Byzantine-resilient SGD defenses, assessed against state-of-the-art attacks and supported by theoretical insights.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Supplementary Material: zip

Data: [Fashion-MNIST](https://paperswithcode.com/dataset/fashion-mnist)

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/distributed-momentum-for-byzantine-resilient/code)

9 Replies

Loading