Keywords: pre-training, fine-tuning, generalization theory
Abstract: Fine-tuning a pre-trained model on the target data is widely used in many deep learning applications, especially for small data sets. However, recent studies have empirically shown that this training strategy offers almost no benefit in computer vision tasks over training from scratch. In this work, we first revisit this observation from the perspective of generalization analysis which is popular in learning theory. Our theory reveals that the final prediction precision has a weak dependency on the pre-trained model. Besides the pre-trained model, data for pre-training are also available for fine-tuning. The observation from pre-trained model inspires us to leverage pre-training data for fine-tuning. With the theoretical analysis, we find that the final performance on target data can be improved when the appropriate pre-training data are included in fine-tuning. Therefore, we propose to select a subset from pre-training data to help the optimization on the target data. A novel selection algorithm is developed according to our analysis. Extensive experiments on 8 benchmark data sets verify the effectiveness of the proposed fine-tuning pipeline.
One-sentence Summary: We propose an improved fine-tuning paradigm by leveraging pre-training daya, based on our theoretical analysis on the pre-training and fine-tuning.ine-tuning
4 Replies
Loading