TL;DR: Parameter Free Dual Averaging is possible by estimating the distance to solution
Abstract: Both gradient descent and dual averaging for convex Lipschitz functions have convergence rates that are highly dependent on the choice of learning rate. Even when the Lipschitz constant is known, setting the learning rate to achieve the optimal convergence rate requires knowing a bound on the distance from the initial point to the solution set $D$. A number of approaches are known that relax this requirement, but they either require line searches, restarting (hyper-parameter grid search), or do not derive from the gradient descent or dual averaging frameworks (coin-betting). In this work we describe a single pass method, with no back-tracking or line searches, derived from dual averaging, which does not require knowledge of $D$ yet asymptotically achieves the optimal rate of convergence for the complexity class of Convex Lipschitz functions.