Efficient-Adam: Communication-Efficient Distributed Adam with Complexity Analysis

Congliang Chen; Li Shen; Haozhi Huang; Wei Liu; Zhi-Quan Luo

Efficient-Adam: Communication-Efficient Distributed Adam with Complexity Analysis

Congliang Chen, Li Shen, Haozhi Huang, Wei Liu, Zhi-Quan Luo

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Withdrawn SubmissionReaders: Everyone

Abstract: Distributed adaptive stochastic gradient methods have been widely used for large scale nonconvex optimization, such as training deep learning models. However, their iteration complexity on finding $\varepsilon$-stationary points has rarely been analyzed in the nonconvex setting. In this work, we present a novel communication-efficient distributed Adam in the parameter-server model for stochastic nonconvex optimization, dubbed {\em Efficient-Adam}. Specifically, we incorporate a two-way quantization scheme into Efficient-Adam to reduce the communication cost between the workers and the server. Simultaneously, we adopt a two-way error feedback strategy to reduce the biases caused by the two-way quantization on both the server and workers, respectively. In addition, we establish the iteration complexity for the proposed Efficient-Adam with a class of quantization operators and further characterize its communication complexity between the server and workers when an $\varepsilon$-stationary point is achieved. Finally, we solve a toy stochastic convex optimization problem and train deep learning models on real-world vision and language tasks. Extensive experimental results together with a theoretical guarantee justify the merits of Efficient Adam.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Reviewed Version (pdf): https://openreview.net/references/pdf?id=upJv2dEcj

1 Reply

Loading