Distributed Adaptive Subgradient Algorithms for Online Learning Over Time-Varying Networks
Abstract: The computational bottleneck in distributed optimization methods, which is based on projected gradient descent, is due to
the computation of a full gradient vector and projection step. This is a particular problem for large datasets. To reduce the
computational complexity of existing methods, we combine the randomized block-coordinate descent and the Frank–Wolfe
techniques, and then propose a distributed randomized block-coordinate projection-free algorithm over networks, where each
agent randomly chooses a subset of the coordinates of its gradient vector and the projection step is eschewed in favor of a
much simpler linear optimization step. Moreover, the convergence performance of the proposed algorithm is also theoretically
analyzed. Specifically, we rigorously prove that the proposed algorithm can converge to optimal point at rate of $O(1/t)$ under
convexity and $O(1/t^{2})$ under strong convexity, respectively. Here, $t$ is the number of iterations. Furthermore, the proposed
algorithm can converge to a stationary point, where the “Frank-Wolfe” gap is equal to zero, at a rate O(1/\sqrt{t}) under nonconvexity. To evaluate the computational benefit of the proposed algorithm, we use the proposed algorithm to solve the
multiclass classification problems by simulation experiments on two datasets, i.e., aloi and news20. The results shows that the
proposed algorithm is faster than the existing distributed optimization algorithms due to its lower computation per iteration.
Furthermore, the results also show that well-connected graphs or smaller graphs leads to faster convergence rate, which can
confirm the theoretical results
0 Replies
Loading