DADA: Dual Averaging with Distance Adaptation

Mohammad Moshtaghifar; Anton Rodomanov; Daniil Vankov; Sebastian U Stich

DADA: Dual Averaging with Distance Adaptation

Mohammad Moshtaghifar, Anton Rodomanov, Daniil Vankov, Sebastian U Stich

Published: 07 Dec 2024, Last Modified: 12 Dec 2024NeurIPS 2024 WorkshopEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Convex Optimization, Constrained Optimization, Gradient Methods, Adaptive Algorithms, Parameter-Free Methods, Dual Averaging, Distance Adaption, Universal Methods, Worst-Case Complexity Guarantees

TL;DR: We introduce DADA, a parameter-free dual averaging algorithm for convex optimization that adapts dynamically without prior knowledge of problem-specific parameters. DADA is universal, working across diverse problem classes.

Abstract: We present a novel parameter-free universal gradient method for solving convex optimization prob- lems. Our algorithm—Dual Averaging with Distance Adaptation (DADA)–is based on the classical scheme of dual averaging and dynamically adjusts its coefficients based on the observed gradi- ents and the distance between its iterates to the starting point, without the need for knowing any problem-specific parameters. DADA is a universal algorithm that simultaneously works for a wide range of problem classes as long as one is able to bound the local growth of the objective around its minimizer. Particular examples of such problem classes are nonsmooth Lipschitz functions, Lipschitz-smooth functions, H¨older-smooth functions, functions with high-order Lipschitz deriva- tive, quasi-self-concordant functions, and (L0, L1)-smooth functions. Furthermore, in contrast to many existing methods, DADA is suitable not only for unconstrained problems, but also con- strained ones, possibly with unbounded domain, and it does not require fixing neither the number of iterations nor the accuracy in advance.

Submission Number: 23

Loading