Reviewed Version (pdf): https://openreview.net/references/pdf?id=j9CzCm9Dq6
Keywords: Heterogeneous model transfer, pretraining-finetuning
Abstract: We propose an effective heterogeneous model transfer (HMT) method that can transfer the knowledge from one pretrained neural network to another neural network. Most of the existing deep learning methods depend much on a pretraining-finetuning strategy, i.e., pretraining a deep model on a large task-related (source) dataset and finetuning it on a small target dataset. Pretraining provides a universal feature representation for the target learning task and thus reduces the overfitting on a small target dataset. However, it is often assumed that the pretrained model and the target model share an identical backbone, which significantly limits the scalability of pretrained deep models. This paper relaxes this limitation and generalizes to heterogeneous model transfer between two different neural networks. Specifically, we select the longest chain from the source model and transfer it to the longest chain of the target model. Motivated by one-shot neural architecture search methods, the longest chain inherits merits from the source model and also serves as a weight-sharing path of the target model, thus provides a good initialization. With the longest chains, the layer-to-layer weight transfer is then transformed by bilinear interpolation and cyclic stack. HMT opens a new window for the pretraining-finetuning strategy and significantly improves the reuse efficiency of pretrained models without re-pretraining on the large source dataset. Experiments on several datasets show the effectiveness of HMT. Anonymous code is at: https://anonymous.4open.science/r/6ab184dc-3c64-4fdd-ba6d-1e5097623dfd/
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
One-sentence Summary: We propose an effective heterogeneous model transfer (HMT) method that can transfer the knowledge from one pretrained neural network to another neural network.