**Keywords:**Optimization theory, Variance reduction, Subsampled Newton, Finite-sum minimization, stochastic optimization

**Abstract:**Stochastic variance reduction has proven effective at accelerating first-order algorithms for solving convex finite-sum optimization tasks such as empirical risk minimization. Yet, the benefits of variance reduction for first-order methods tend to vanish in the large-batch setting, i.e., when stochastic gradients are computed from very large mini-batches to leverage parallelization of modern computing architectures. On the other hand, incorporating second-order information via Newton-type methods has proven successful in improving the performance of large-batch algorithms. In this work, we show that, in the presence of second-order information, variance reduction in the gradient can provide significant convergence acceleration even when using extremely large-batch gradient estimates. To demonstrate this, we study a finite-sum minimization algorithm we call Stochastic Variance-Reduced Newton (SVRN). We show that SVRN provably accelerates existing stochastic Newton-type methods (such as Subsampled Newton), while retaining their parallelizable large-batch operations: The number of passes over the data is reduced from $O(\alpha\log(1/\epsilon))$ to $O\big(\frac{\log(1/\epsilon)}{\log(n)}\big)$, i.e., by a factor of $O(\alpha\log(n))$, where $n$ is the number of sum components and $\alpha$ is the approximation factor in the Hessian estimate. Surprisingly, this acceleration gets more significant the larger the data size $n$, and can be achieved with a unit step size.

**Submission Number:**59

Loading