Keywords: optimization, distributionally robust optimization, deep learning
TL;DR: We provide Adaptive Distributionally Robust Optimizer for DL, prove its convergence in non-convex scenario and provide evaluation
Abstract: While traditional Deep Learning (DL) optimization methods treat all training samples equally, Distributionally Robust Optimization (DRO) adaptively assigns importance weights to different samples. However, a significant gap exists between DRO and current DL practices. Modern DL optimizers require adaptivity and the ability to handle stochastic gradients, as these methods demonstrate superior performance. This paper aims to bridge this gap by introducing ALSO -- Adaptive Loss Scaling Optimizer -- an adaptive DRO algorithm suitable for DL. We prove the convergence of our proposed algorithm for non-convex objectives, the standard setting for DL models. Empirical evaluation demonstrates that ALSO outperforms baselines.
Submission Number: 59
Loading