On the Tunability of Optimizers in Deep Learning

Prabhu Teja S*; Florian Mai*; Thijs Vogels; Martin Jaggi; Francois Fleuret

On the Tunability of Optimizers in Deep Learning

Prabhu Teja S, Florian Mai, Thijs Vogels, Martin Jaggi, Francois Fleuret

25 Sept 2019 (modified: 05 May 2023)ICLR 2020 Conference Blind SubmissionReaders: Everyone

Keywords: Optimization, Benchmarking, Hyperparameter optimization

TL;DR: We provide a method to benchmark optimizers that is cognizant to the hyperparameter tuning process.

Abstract: There is no consensus yet on the question whether adaptive gradient methods like Adam are easier to use than non-adaptive optimization methods like SGD. In this work, we fill in the important, yet ambiguous concept of ‘ease-of-use’ by defining an optimizer’s tunability: How easy is it to find good hyperparameter configurations using automatic random hyperparameter search? We propose a practical and universal quantitative measure for optimizer tunability that can form the basis for a fair optimizer benchmark. Evaluating a variety of optimizers on an extensive set of standard datasets and architectures, we find that Adam is the most tunable for the majority of problems, especially with a low budget for hyperparameter tuning.

Original Pdf: pdf

19 Replies

Loading

On the Tunability of Optimizers in Deep Learning

Prabhu Teja S*, Florian Mai*, Thijs Vogels, Martin Jaggi, Francois Fleuret

Prabhu Teja S, Florian Mai, Thijs Vogels, Martin Jaggi, Francois Fleuret