Abstract: Distributed Machine Learning (DML) is widely used to accelerate the training of the deep learning model. In DML, Parameter-Server (PS) and Ring AllReduce are two typical architectures. Recently, observing that many works address the security problem in PS, whose performance can be greatly degraded by malicious participation during the training process. However, the robustness of Ring AllReduce, which can solve the communication bandwidth problem in PS, to the malicious participant is still unknown. In this paper, we design a series of experiments to explore the security problem in Ring AllReduce, and reveal it can also suffer from the malicious participant.
0 Replies
Loading