Aligning Distributionally Robust Optimization with Practical Deep Learning Needs

Published: 22 Sept 2025, Last Modified: 01 Dec 2025NeurIPS 2025 WorkshopEveryoneRevisionsBibTeXCC BY 4.0
Keywords: optimization, distributionally robust optimization, deep learning
TL;DR: We provide Adaptive Distributionally Robust Optimizer for DL, prove its convergence in non-convex scenario and provide evaluation
Abstract: While traditional Deep Learning (DL) optimization methods treat all training samples equally, Distributionally Robust Optimization (DRO) adaptively assigns importance weights to different samples. However, a significant gap exists between DRO and current DL practices. Modern DL optimizers require adaptivity and the ability to handle stochastic gradients, as these methods demonstrate superior performance. This paper aims to bridge this gap by introducing ALSO -- Adaptive Loss Scaling Optimizer -- an adaptive DRO algorithm suitable for DL. We prove the convergence of our proposed algorithm for non-convex objectives, the standard setting for DL models. Empirical evaluation demonstrates that ALSO outperforms baselines.
Submission Number: 59
Loading