A Computation and Communication Efficient Projection-free Algorithm for Decentralized Constrained Optimization

Tong He; Hao Di; Haishan Ye; Xiangyu Chang; Guang Dai; Ivor Tsang

A Computation and Communication Efficient Projection-free Algorithm for Decentralized Constrained Optimization

Tong He, Hao Di, Haishan Ye, Xiangyu Chang, Guang Dai, Ivor Tsang

28 Sept 2024 (modified: 04 Dec 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Decentralized stochastic optimization, variance reduction, Frank-Wolfe method

Abstract: Decentralized constrained optimization problems arise in numerous real-world applications, where a major challenge lies in the computational complexity of projecting onto complex sets, especially in large-scale systems. The projection-free method, Frank-Wolfe (FW), is popular for the constrained optimization problem with complex sets due to its efficiency in tackling the projection process. However, when applying FW methods to decentralized constrained finite-sum optimization problems, previous studies provide suboptimal incremental first-order oracle (IFO) bounds in both convex and non-convex settings. In this paper, we propose a stochastic algorithm named Decentralized Variance Reduction Gradient Tracking Frank-Wolfe ($\texttt{DVRGTFW}$), which incorporates the techniques of variance reduction, gradient tracking, and multi-consensus in the FW update to obtain tight bounds. We present a novel convergence analysis, diverging from previous decentralized FW methods, and demonstrating $\tilde{\mathcal{O}}(n+\sqrt{\frac{n}{m}}L\varepsilon^{-1})$ and $\mathcal{O}(\sqrt{\frac{n}{m}}L^2\varepsilon^{-2})$ IFO complexity bounds in convex and non-convex settings, respectively. To the best of our knowledge, these bounds are the best achieved in the literature to date. Besides, in the non-convex case, $\texttt{DVRGTFW}$ achieves $\mathcal{O}(\frac{L^2\varepsilon^{-2}}{\sqrt{1-\lambda_2(W)}})$ communication complexity which is closed to the lower bound $\Omega(\frac{L\varepsilon^{-2}}{\sqrt{1-\lambda_2(W)}})$. Empirical results validate the convergence properties of $\texttt{DVRGTFW}$ and highlight its superior performance over other related methods.

Supplementary Material: zip

Primary Area: optimization

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 13773

Loading