Abstract: We study the impact of the constraint set and gradient geometry on the convergence of online and stochastic methods for convex optimization, providing a characterization of the geometries for which stochastic gradient and adaptive gradient methods are (minimax) optimal. In particular, we show that when the constraint set is quadratically convex, AdaGrad and related stochastic gradient methods are minimax optimal, providing a converse that shows when the constraints are not quadratically convex---for example, any $\ell_p$-ball for $p \le 2$---the methods are far from optimal. Based on this, we can provide concrete recommendations for when one should use adaptive gradient methods and when one should not.
Code Link: This paper does not contain experimen
CMT Num: 6136