Distributed Adaptive Subgradient Algorithms for Online Learning Over Time-Varying Networks

Zhu Junlong, Xin Wang, Mingchuan Zhang, Muhua Liu, Qingtao Wu

16 May 2023OpenReview Archive Direct UploadReaders: Everyone

Abstract: The computational bottleneck in distributed optimization methods, which is based on projected gradient descent, is due to the computation of a full gradient vector and projection step. This is a particular problem for large datasets. To reduce the computational complexity of existing methods, we combine the randomized block-coordinate descent and the Frank–Wolfe techniques, and then propose a distributed randomized block-coordinate projection-free algorithm over networks, where each agent randomly chooses a subset of the coordinates of its gradient vector and the projection step is eschewed in favor of a much simpler linear optimization step. Moreover, the convergence performance of the proposed algorithm is also theoretically analyzed. Specifically, we rigorously prove that the proposed algorithm can converge to optimal point at rate of $O(1/t)$ under convexity and $O(1/t^{2})$ under strong convexity, respectively. Here, $t$ is the number of iterations. Furthermore, the proposed algorithm can converge to a stationary point, where the “Frank-Wolfe” gap is equal to zero, at a rate O(1/\sqrt{t}) under nonconvexity. To evaluate the computational benefit of the proposed algorithm, we use the proposed algorithm to solve the multiclass classification problems by simulation experiments on two datasets, i.e., aloi and news20. The results shows that the proposed algorithm is faster than the existing distributed optimization algorithms due to its lower computation per iteration. Furthermore, the results also show that well-connected graphs or smaller graphs leads to faster convergence rate, which can confirm the theoretical results

0 Replies