AI-SARAH: Adaptive and Implicit Stochastic Recursive Gradient Methods
Abstract: We present AI-SARAH, a practical variant of SARAH. As a variant of SARAH, this algorithm employs the stochastic recursive gradient yet adjusts step-size based on local geometry. AI-SARAH implicitly computes step-size and efficiently estimates local Lipschitz smoothness of stochastic functions. It is fully adaptive, tune-free, straightforward to implement, and computationally efficient. We provide technical insight and intuitive illustrations on its design and convergence. We conduct extensive empirical analysis and demonstrate its strong performance compared with its classical counterparts and other state-of-the-art first-order methods in solving convex machine learning problems.
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: When compared to the original submission, we removed some parts (now in red), and all the additions are in blue color. Responses to the concerns are explained in the replies to the reviews. The most extensive changes are in Section 3, where we modified Algorithm 1 to include both uniform and importance sampling. We also made the algorithm more self-contained. In Section 4 we also added a statement that the practical version is using uniform sampling only (and hence we did not focus on making the practical algorithm with importance sampling), instead of that we proposed to sample a minibatch uniformly at random. Note that Theorem 3.1. is valid for both samplings, however, the L^t is defined differently for various sampling strategies. Section 1.2 now contains more details about related work on adaptive algorithms (like stochastic Polyak and its extensions). We also included a comparison with Adam/Adagrad and RMSProp in the Figure 1 as requested by reviewer.
Assigned Action Editor: ~Daniel_M_Roy1
Submission Number: 462