Accelerated Variance Reduction Stochastic ADMM for Large-Scale Machine Learning

Yuanyuan Liu, Fanhua Shang, Hongying Liu, Lin Kong, Licheng Jiao, Zhouchen Lin

2021 (modified: 04 Oct 2022)IEEE Trans. Pattern Anal. Mach. Intell. 2021Readers: Everyone

Abstract: Recently, many stochastic variance reduced alternating direction methods of multipliers (ADMMs) (e.g., SAG-ADMM and SVRG-ADMM) have made exciting progress such as linear convergence rate for strongly convex (SC) problems. However, their best-known convergence rate for non-strongly convex (non-SC) problems is <inline-formula><tex-math notation="LaTeX">$\mathcal {O}(1/T)$</tex-math></inline-formula> as opposed to <inline-formula><tex-math notation="LaTeX">$\mathcal {O}(1/T^2)$</tex-math></inline-formula> of accelerated deterministic algorithms, where <inline-formula><tex-math notation="LaTeX">$T$</tex-math></inline-formula> is the number of iterations. Thus, there remains a gap in the convergence rates of existing stochastic ADMM and deterministic algorithms. To bridge this gap, we introduce a new momentum acceleration trick into stochastic variance reduced ADMM, and propose a novel accelerated SVRG-ADMM method (called ASVRG-ADMM) for the machine learning problems with the constraint <inline-formula><tex-math notation="LaTeX">$Ax + By = c$</tex-math></inline-formula> . Then we design a linearized proximal update rule and a simple proximal one for the two classes of ADMM-style problems with <inline-formula><tex-math notation="LaTeX">$B = \tau I$</tex-math></inline-formula> and <inline-formula><tex-math notation="LaTeX">$B\ne \tau I$</tex-math></inline-formula> , respectively, where <inline-formula><tex-math notation="LaTeX">$I$</tex-math></inline-formula> is an identity matrix and <inline-formula><tex-math notation="LaTeX">$\tau$</tex-math></inline-formula> is an arbitrary bounded constant. Note that our linearized proximal update rule can avoid solving sub-problems iteratively. Moreover, we prove that ASVRG-ADMM converges linearly for SC problems. In particular, ASVRG-ADMM improves the convergence rate from <inline-formula><tex-math notation="LaTeX">$\mathcal {O}(1/T)$</tex-math></inline-formula> to <inline-formula><tex-math notation="LaTeX">$\mathcal {O}(1/T^2)$</tex-math></inline-formula> for non-SC problems. Finally, we apply ASVRG-ADMM to various machine learning problems, e.g., graph-guided fused Lasso, graph-guided logistic regression, graph-guided SVM, generalized graph-guided fused Lasso and multi-task learning, and show that ASVRG-ADMM consistently converges faster than the state-of-the-art methods.

0 Replies