Inheriting Generalized Learngene for Efficient Knowledge Transfer across Multiple Tasks

Published: 01 Jan 2025, Last Modified: 15 May 2025AAAI 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In practical applications, it is often necessary to transfer knowledge from large pretrained models to small ones with various architectures for tackling different tasks. The Learngene framework, proposed recently, firstly extracts one compact module termed as learngene from a large well-trained model, after which learngene is used to build descendant models for handling diverse tasks. In this paper, we aim to explore extracting and inheriting learngene which can be generalized across different model architectures and tasks, remaining understudied in previous works. Inspired by the existing observations that large kernel convolutional neural networks (CNNs) exhibit significant generalization potential across various architectures and tasks, we propose a novel two-stage Learngene method termed CLKG (Convolutional Learngene for Knowledge Generalization), which inherits convolutional kernels containing generalized knowledge as learngene to build diverse models for multiple tasks. Specifically, we construct an auxiliary model comprised of small kernels and train it through dense feature distillation to inherit the feature extraction ability from large kernel CNNs. After distillation, we select certain kernels from the auxiliary model as learngene based on three criteria: direct kernel extraction, priority to edge kernels, and continuous kernel selection. Subsequently, we adapt learngene according to the width of the descendant models and use it to initialize the backbone part of descendant models. Experiments on diverse vision tasks such as image classification, object detection and semantic segmentation demonstrate the superiority of CLKG. For example, compared with from scratch training, it brings 2.89% improvements on VOC12+SBD, and reduces around 2x training data volume and training epochs to achieve better results. Furthermore, compared to knowledge distillation method, CLKG significantly reduces negative transfer on certain datasets, e.g., achieves 1.88% performance improvements on NAO dataset despite domain differences.
Loading