Differentiable Hyper-parameter Optimization

Bozhou Chen; Hongzhi Wang; Chenmin Ba

Differentiable Hyper-parameter Optimization

Bozhou Chen, Hongzhi Wang, Chenmin Ba

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone

Keywords: Hyper-parameter Optimization

Abstract: Hyper-parameters are widely present in machine learning. Concretely, large amount of hyper-parameters exist in network layers, such as kernel size, channel size and the hidden layer size, which directly affect performance of the model. Thus, hyper-parameter optimization is crucial for machine learning. Current hyper-parameter optimization always requires multiple training sessions, resulting in a large time consuming. To solve this problem, we propose a method to fine-tune neural network's hyper-parameters efficiently in this paper, where optimization completes in only one training session. We apply our method for the optimization of various neural network layers' hyper-parameters and compare it with multiple benchmark hyper-parameter optimization models. Experimental results show that our method is commonly 10 times faster than traditional and mainstream methods such as random search, Bayesian optimization and many other state-of-art models. It also achieves higher quality hyper-parameters with better accuracy and stronger stability.

One-sentence Summary: We fine-tune channel size and some other hyper-parameters differentiable in a single training session by making these hyper-parameters differentiable.

5 Replies

Loading