KIND: Knowledge Integration and Diversion for Training Decomposable Models

Yucheng Xie; Fu Feng; Ruixiao Shi; Jing Wang; Yong Rui; Xin Geng

KIND: Knowledge Integration and Diversion for Training Decomposable Models

Yucheng Xie, Fu Feng, Ruixiao Shi, Jing Wang, Yong Rui, Xin Geng

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Pre-trained models have become the preferred backbone due to the increasing complexity of model parameters. However, traditional pre-trained models often face deployment challenges due to their fixed sizes, and are prone to negative transfer when discrepancies arise between training tasks and target tasks. To address this, we propose **KIND**, a novel pre-training method designed to construct decomposable models. KIND integrates knowledge by incorporating Singular Value Decomposition (SVD) as a structural constraint, with each basic component represented as a combination of a column vector, singular value, and row vector from $U$, $\Sigma$, and $V^\top$ matrices. These components are categorized into **learngenes** for encapsulating class-agnostic knowledge and \textbf{tailors} for capturing class-specific knowledge, with knowledge diversion facilitated by a class gate mechanism during training. Extensive experiments demonstrate that models pre-trained with KIND can be decomposed into learngenes and tailors, which can be adaptively recombined for diverse resource-constrained deployments. Moreover, for tasks with large domain shifts, transferring only learngenes with task-agnostic knowledge, when combined with randomly initialized tailors, effectively mitigates domain shifts. Code will be made available at https://github.com/Te4P0t/KIND.

Lay Summary: Modern AI models often rely on pre-trained backbones, but these models are typically fixed in size and don’t adapt well when applied to different tasks—especially if there’s a big difference between training and deployment scenarios. To overcome this, we introduce KIND, a new pre-training approach that builds models out of smaller, reusable pieces. KIND uses a mathematical technique called Singular Value Decomposition (SVD) to break knowledge into basic components. These components are grouped into learngenes, which carry general-purpose knowledge, and tailors, which adapt to specific tasks. During training, a mechanism called a “class gate” separates these two types of knowledge. Experiments show that models trained with KIND can be flexibly adapted to different devices or domains by recombining the components. In particular, transferring just the general-purpose learngenes proves effective in new tasks with limited resources or large domain differences. Code is available at: https://github.com/Te4P0t/KIND.

Primary Area: Deep Learning->Algorithms

Keywords: Model Initialization, Decomposable Pre-trained Models, Diffusion Transformers

Submission Number: 2104

Loading