Best Practices for Fine-Tuning Visual Classifiers to New Domains

Brian Chu, Vashisht Madhavan, Oscar Beijbom, Judy Hoffman, Trevor Darrell

2016 (modified: 16 Jul 2019)ECCV Workshops (3) 2016Readers: Everyone

Abstract: Recent studies have shown that features from deep convolutional neural networks learned using large labeled datasets, like ImageNet, provide effective representations for a variety of visual recognition tasks. They achieve strong performance as generic features and are even more effective when fine-tuned to target datasets. However, details of the fine-tuning procedure across datasets and with different amount of labeled data are not well-studied and choosing the best fine-tuning method is often left to trial and error. In this work we systematically explore the design-space for fine-tuning and give recommendations based on two key characteristics of the target dataset: visual distance from source dataset and the amount of available training data. Through a comprehensive experimental analysis, we conclude, with a few exceptions, that it is best to copy as many layers of a pre-trained network as possible, and then adjust the level of fine-tuning based on the visual distance from source.

0 Replies