Exploiting Local Convergence of Quasi-Newton Methods Globally: Adaptive Sample Size Approach

Qiujiang Jin; Aryan Mokhtari

Exploiting Local Convergence of Quasi-Newton Methods Globally: Adaptive Sample Size Approach

Qiujiang Jin, Aryan Mokhtari

Published: 09 Nov 2021, Last Modified: 05 May 2023NeurIPS 2021 PosterReaders: Everyone

Keywords: Quasi-Newton methods, adaptive sample size methods, second-order methods, large-scale optimization.

TL;DR: We present a novel adaptive sample size quasi-Newton method that exploits the superlinear convergence rate of quasi-Newton methods throughout the entire training process to solve large-scale ERM problems efficiently.

Abstract: In this paper, we study the application of quasi-Newton methods for solving empirical risk minimization (ERM) problems defined over a large dataset. Traditional deterministic and stochastic quasi-Newton methods can be executed to solve such problems; however, it is known that their global convergence rate may not be better than first-order methods, and their local superlinear convergence only appears towards the end of the learning process. In this paper, we use an adaptive sample size scheme that exploits the superlinear convergence of quasi-Newton methods globally and throughout the entire learning process. The main idea of the proposed adaptive sample size algorithms is to start with a small subset of data points and solve their corresponding ERM problem within its statistical accuracy, and then enlarge the sample size geometrically and use the optimal solution of the problem corresponding to the smaller set as an initial point for solving the subsequent ERM problem with more samples. We show that if the initial sample size is sufficiently large and we use quasi-Newton methods to solve each subproblem, the subproblems can be solved superlinearly fast (after at most three iterations), as we guarantee that the iterates always stay within a neighborhood that quasi-Newton methods converge superlinearly. Numerical experiments on various datasets confirm our theoretical results and demonstrate the computational advantages of our method.

Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.

Supplementary Material: pdf

14 Replies

Loading