Decomposable Non-Smooth Convex Optimization with Nearly-Linear Gradient Oracle Complexity

Sally Dong; Haotian Jiang; Yin Tat Lee; Swati Padmanabhan; Guanghao Ye

Decomposable Non-Smooth Convex Optimization with Nearly-Linear Gradient Oracle Complexity

Sally Dong, Haotian Jiang, Yin Tat Lee, Swati Padmanabhan, Guanghao Ye

Published: 31 Oct 2022, Last Modified: 11 Jan 2023NeurIPS 2022 AcceptReaders: Everyone

Keywords: decomposable, non-smooth convex optimization, gradient oracle complexity, submodular function minimization

Abstract: Many fundamental problems in machine learning can be formulated by the convex program \[ \min_{\theta\in \mathbb{R}^d}\ \sum_{i=1}^{n}f_{i}(\theta), \] where each $f_i$ is a convex, Lipschitz function supported on a subset of $d_i$ coordinates of $\theta$. One common approach to this problem, exemplified by stochastic gradient descent, involves sampling one $f_i$ term at every iteration to make progress. This approach crucially relies on a notion of uniformity across the $f_i$'s, formally captured by their condition number. In this work, we give an algorithm that minimizes the above convex formulation to $\epsilon$-accuracy in $\widetilde{O}(\sum_{i=1}^n d_i \log (1 /\epsilon))$ gradient computations, with no assumptions on the condition number. The previous best algorithm independent of the condition number is the standard cutting plane method, which requires $O(nd \log (1/\epsilon))$ gradient computations. As a corollary, we improve upon the evaluation oracle complexity for decomposable submodular minimization by [Axiotis, Karczmarz, Mukherjee, Sankowski and Vladu, ICML 2021]. Our main technical contribution is an adaptive procedure to select an $f_i$ term at every iteration via a novel combination of cutting-plane and interior-point methods.

Supplementary Material: pdf

15 Replies

Loading