TL;DR: Parameter Free Dual Averaging is possible by estimating the distance to solution
Abstract: Both gradient descent and dual averaging for convex Lipschitz functions
have convergence rates that are highly dependent on the choice of learning
rate. Even when the Lipschitz constant is known, setting the learning rate to achieve the optimal convergence rate requires knowing a bound on the distance from the initial point to the solution set $D$. A number
of approaches are known that relax this requirement, but they either
require line searches, restarting (hyper-parameter grid search), or do not derive
from the gradient descent or dual averaging frameworks (coin-betting).
In this work we describe a single pass method, with no back-tracking or line searches,
derived from dual averaging,
which does not require knowledge of $D$ yet asymptotically achieves
the optimal rate of convergence for the complexity class of Convex
Lipschitz functions.
0 Replies
Loading