- TL;DR: This paper re-examines several common practices of setting hyper-parameters for fine-tuning.
- Abstract: Fine-tuning from pre-trained ImageNet models has become the de-facto standard for various computer vision tasks. Current practices for fine-tuning typically involve selecting an ad-hoc choice of hyper-parameters and keeping them fixed to values normally used for training from scratch. This paper re-examines several common practices of setting hyper-parameters for fine-tuning. Our findings are based on extensive empirical evaluation for fine-tuning on various transfer learning benchmarks. (1) While prior works have thoroughly investigated learning rate and batch size, momentum for fine-tuning is a relatively unexplored parameter. We find that picking the right value for momentum is critical for fine-tuning performance and connect it with previous theoretical findings. (2) Optimal hyper-parameters for fine-tuning in particular the effective learning rate are not only dataset dependent but also sensitive to the similarity between the source domain and target domain. This is in contrast to hyper-parameters for training from scratch. (3) Reference-based regularization that keeps models close to the initial model does not necessarily apply for "dissimilar" datasets. Our findings challenge common practices of fine- tuning and encourages deep learning practitioners to rethink the hyper-parameters for fine-tuning.
- Keywords: fine-tuning, hyperparameter search. transfer learning