Keywords: Polyak-Łojasiewicz Condition, First-order Algorithms, Lower Bound, Complexity
TL;DR: We show that any first-order algorithm requires at least $\tilde{\Omega}\left((L/\mu)^{1-\alpha} \right)$ gradient costs to find an $\epsilon$-approximate optimal solution for a general $L$-smooth, $\mu$-PL function for any $\alpha>0$ .
Abstract: Polyak-Łojasiewicz (PL) [Polyak, 1963] condition is a weaker condition than the strong convexity but suffices to ensure a global convergence for the Gradient Descent algorithm. In this paper, we study the lower bound of algorithms using first-order oracles to find an approximate optimal solution. We show that any first-order algorithm requires at least $\Omega\left((L/\mu)^{1-\alpha} \right)$ gradient costs to find an $\epsilon$-approximate optimal solution for a general $L$-smooth function that has an $\mu$-PL constant for any $\alpha>0$. This result demonstrates the near optimality of the Gradient Descent algorithm to minimize smooth PL functions in the sense that there exists a ``hard'' PL function such that no first-order algorithm can be faster by a polynomial order. In contrast, it is well-known that the momentum technique, e.g. [Nesterov, 2003, chap. 2] can provably accelerate Gradient Descent to $O\left(\sqrt{L/\hat{\mu}}\log\frac{1}{\epsilon}\right)$ gradient costs for functions that are $L$-smooth and $\hat{\mu}$-strongly convex. Therefore, our result distinguishes the hardness of minimizing a smooth PL function and a smooth strongly convex function as the complexity of the former cannot be improved by any polynomial order in general.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Optimization (eg, convex and non-convex optimization)
Supplementary Material: zip
9 Replies
Loading