Learngene Tells You How to Customize: Task-Aware Parameter Initialization at Flexible Scales

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: Task-Aware Parameter Initialization at Flexible Scales
Abstract: Appropriate parameter initialization strategies are essential for reducing the high computational costs of training large pretrained models in various task scenarios. Graph HyperNetwork (GHN), a parameter initialization method, has recently demonstrated strong performance in initializing models. However, GHN still faces several challenges, including limited effectiveness in initializing larger models, poor performance on smaller datasets, and the requirement of task-specific GHN training, where each new task necessitates retraining the GHN model, leading to increased computational and storage overhead. To overcome these challenges, motivated by the recently proposed Learngene framework, we propose a novel method called **T**ask-**A**ware **L**earngene (**TAL**). Briefly, our approach pretrains a TAL model under the guidance of a well-trained model and then performs multi-task tuning to obtain a shared TAL model that enables parameter prediction based on both model architectures and task-specific characteristics. Extensive experiments show the superiority of TAL. Models initialized with TAL outperform those initialized using GHN method by an average of 24.39\% in terms of accuracy across Decathlon datasets.
Lay Summary: Modern AI models often require enormous computing resources to train, especially when they are adapted to new tasks. One way to make this process more efficient is to give these models a good “starting point” — a smart guess of what their internal settings should be, before training begins. An existing method called Graph HyperNetwork (GHN) has shown promise in doing this. However, GHN struggles when models are large or when data is limited. It also needs to be retrained from scratch every time a new task comes up, which is time-consuming and inefficient. To address these problems, we introduce a new method called Task-Aware Learngene (TAL). TAL learns from a strong, existing model and then adapts to many tasks at once. This way, it can make smart predictions about how a model should start — not just based on the model's structure, but also on the nature of the task it will solve.
Link To Code: https://github.com/mathieuxu/Task-Aware-Learngene
Primary Area: Deep Learning->Algorithms
Keywords: model initialization, Learngene, hypernetwork
Submission Number: 8661
Loading