Byzantine-Robust Optimization under $(L_0,L_1)$-Smoothness

Arman Bolatov; Samuel Horváth; Martin Takáč; Eduard Gorbunov

Byzantine-Robust Optimization under $(L_0,L_1)$-Smoothness

Arman Bolatov, Samuel Horváth, Martin Takáč, Eduard Gorbunov

Published: 22 Jan 2026, Last Modified: 13 Mar 2026CPAL 2026 (Proceedings Track) PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: byzantine-robust optimization, federated learning, generalized smoothness, normalized SGD

TL;DR: We introduce Byz-NSGDM, a normalized momentum SGD method robust to Byzantine attacks and $(L_0,L_1)$-smoothness, achieving $O(K^{-1/4})$ convergence and outperforming prior methods on heterogeneous MNIST and synthetic tasks.

Abstract: We consider distributed optimization under Byzantine attacks in the presence of $(L_0,L_1)$-smoothness, a generalization of standard $L$-smoothness that captures functions with state-dependent gradient Lipschitz constants. We propose $\texttt{Byz-NSGDM}$, a normalized stochastic gradient descent method with momentum that achieves robustness against Byzantine workers while maintaining convergence guarantees. Our algorithm combines momentum normalization with Byzantine-robust aggregation enhanced by Nearest Neighbor Mixing (NNM) to handle both the challenges posed by $(L_0,L_1)$-smoothness and Byzantine adversaries. We prove that $\texttt{Byz-NSGDM}$ achieves a convergence rate of $O(K^{-1/4})$ up to a Byzantine bias floor proportional to the robustness coefficient and gradient heterogeneity. Experimental validation on heterogeneous MNIST classification and synthetic $(L_0,L_1)$-smooth optimization problems demonstrates the effectiveness of our approach against various Byzantine attack strategies.

Submission Number: 100

Loading