DADA: Dual Averaging with Distance Adaptation

DADA: Dual Averaging with Distance Adaptation

ICLR 2026 Conference Submission20395 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Adaptive Optimization, Universal Gradient Method, Dual Averaging

TL;DR: Dual Averaging with Distance Adaptation (DADA), a novel parameter-free universal gradient method for solving convex optimization problems.

Abstract: We present a novel parameter-free universal gradient method for solving convex optimization problems. Our algorithm—Dual Averaging with Distance Adaptation (DADA)–is based on the classical scheme of dual averaging and dynamically adjusts its coefficients based on the observed gradients and the distance between its iterates to the starting point, without the need for knowing any problem-specific parameters. DADA is a universal algorithm that simultaneously works for a wide range of problem classes as long as one is able to bound the local growth of the objective around its minimizer. Particular examples of such problem classes are nonsmooth Lipschitz functions, Lipschitz-smooth functions, Hölder-smooth functions, functions with high-order Lipschitz derivative, quasi-self-concordant functions, and (L0, L1)-smooth functions. Furthermore, in contrast to many existing methods, DADA is suitable not only for unconstrained problems, but also constrained ones, possibly with unbounded domain, and it does not require fixing neither the number of iterations nor the accuracy in advance.

Supplementary Material: zip

Primary Area: optimization

Submission Number: 20395

Loading