Keywords: Optimizers, interpolation, convergence
TL;DR: Trail Mix is a convex framework that provably preserves convergence rates while adaptively interpolating a wide range of optimizers, acting like an ensemble when they are complementary and collapsing onto the best one when it dominates.
Abstract: Optimizers are central to modern deep learning, yet no single algorithm consistently excels across architectures or datasets. Existing methods of adaptively mixing optimizers to combine complementary strengths are promising, but are restricted to narrow optimizer families or lack rigorous guarantees, leaving a gap between theory and practice. To fill this gap, we present TrailMix, an adaptive interpolation framework that is general across all first- and quasi-second-order methods. On the theoretical front, we prove that convex combinations of optimizers satisfying a mild alignment condition preserve standard convergence rates in non-convex, convex, and strongly convex or PL regimes. For the challenging same-timescale setting, we establish a novel analysis method by lifting the stochastic dynamics to a population-level Fokker-Planck PDE, for which we prove stability using a joint free-energy Lyapunov function. Algorithmically, we extend this framework with fairness normalization, trust-region clipping, and a curvature-awareness reward that stabilizes the meta-weights and enables smoother training. These additions allow TrailMix to behave like an ensemble when optimizers are complementary and to concentrate weight when one dominates, without breaking convexity. Our empirical evaluations on an optimizer set including AdamW, Lion, SOAP, Scion, and MARS show that TrailMix consistently matches or outperforms the strongest single optimizer across a wide range of analytic loss surfaces.
Supplementary Material: zip
Primary Area: optimization
Submission Number: 23738
Loading