Abstract: Byzantine-robust distributed learning (BRDL), which refers to distributed learning that can work with potential faulty or malicious workers (also known as Byzantine workers), has recently attracted much research attention. Robust aggregators are widely used in existing BRDL methods to obtain robustness against Byzantine workers. However, Byzantine workers do not always exist in applications. As far as we know, there is almost no existing work theoretically investigating the effect of using robust aggregators when there are no Byzantine workers. To bridge this knowledge gap, we theoretically analyze the aggregation error for robust aggregators when there are no Byzantine workers. Specifically, we show that the worst-case aggregation error without Byzantine workers increases with the increase of the number of Byzantine workers that a robust aggregator can tolerate. The theoretical result reveals the tension between Byzantine robustness and no-attack accuracy, which refers to accuracy without faulty workers and malicious workers in this paper. Furthermore, we provide lower bounds for the convergence rate of gradient descent with robust aggregators for non-convex objective functions and objective functions that satisfy the Polyak-Lojasiewicz (PL) condition, respectively. We also prove the tightness of the lower bounds. The lower bounds for convergence rate reveal similar tension between Byzantine robustness and no-attack accuracy. Empirical results further support our theoretical findings.
Lay Summary: Byzantine-robust distributed learning (BRDL), which refers to distributed learning with potential faulty workers (also known as Byzantine workers), has recently attracted much research attention. Robust aggregators are widely used in existing BRDL methods to obtain robustness against Byzantine workers. However, Byzantine workers do not always exist in applications. As far as we know, there is almost no existing work theoretically investigating the effect of using robust aggregators when there is no Byzantine worker. In view of this challenge, we theoretically reveal the tension between Byzantine robustness and no-attack accuracy. Specifically, we prove that to obtain robustness against more Byzantine workers, the worst-case performance in no-attack cases is inevitably degraded.
Primary Area: Optimization->Large Scale, Parallel and Distributed
Keywords: distributed machine learning, Byzantine robustness, robust aggregation
Submission Number: 121
Loading