On the Tunability of Optimizers in Deep LearningDownload PDF

25 Sep 2019 (modified: 24 Dec 2019)ICLR 2020 Conference Blind SubmissionReaders: Everyone
  • Original Pdf: pdf
  • Keywords: Optimization, Benchmarking, Hyperparameter optimization
  • TL;DR: We provide a method to benchmark optimizers that is cognizant to the hyperparameter tuning process.
  • Abstract: There is no consensus yet on the question whether adaptive gradient methods like Adam are easier to use than non-adaptive optimization methods like SGD. In this work, we fill in the important, yet ambiguous concept of ‘ease-of-use’ by defining an optimizer’s tunability: How easy is it to find good hyperparameter configurations using automatic random hyperparameter search? We propose a practical and universal quantitative measure for optimizer tunability that can form the basis for a fair optimizer benchmark. Evaluating a variety of optimizers on an extensive set of standard datasets and architectures, we find that Adam is the most tunable for the majority of problems, especially with a low budget for hyperparameter tuning.
19 Replies