Optimization and Generalizability: Fair Benchmarking for Stochastic Algorithms

20 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: non-convex optimization; stochastic optimization algorithms; deep learning; generalizability
TL;DR: Novel noise-enabled optimization algorithms and benchmarking that accounts for optimizer stochasticity and complex loss landscape
Abstract: We currently lack full understanding of what makes a good optimizer for deep learning and whether improved optimization performance confers higher generalizability. Current literature neglects an important innate characteristic of SGD and variants, their stochasticity, failing to fairly benchmark these algorithms and reveal their performance in the statistical sense. This paper fills this gap. Unlike existing work which evaluates the end point of one navigation/optimization trajectory, we utilize and sample from the ensemble of many optimization trajectories, so that we can estimate the stationary distribution of a stochastic optimizer. We cast a wide net and include SGD and noise-enabled variants, flat-minima optimizers, and novel algorithms we debut in this paper by recasting and broadening the SGD algorithm under the Basin Hopping framework. Our evaluation considers both synthetic functions with known global and local minima of varying flatness and real-world problems in computer vision and natural language processing. Fair benchmarking accounts for the statistical setting, comparing stationary distributions and establishing statistical significance. Our paper reveals several findings on the relationship between training loss and hold-out accuracy, the comparable performance of SGD, noise-enabled variants, and novel optimizers based on the BH framework; indeed, these algorithms match the performance of flat-minima optimizers such as SAM with half the gradient evaluations. We hope that this work will support further research in deep learning optimization relying not on single models but instead accounting for the stochasticity of optimizers.
Supplementary Material: pdf
Primary Area: optimization
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2694
Loading