Keywords: Model Initialization, Vision Transformers, Clustering
Abstract: In recent years, the merging of vast datasets with powerful computational resources has led to the emergence of large pre-trained models in the field of deep learning. However, the common practices often overgeneralize the applicability of these models, overlooking the task-specific resource constraints. To mitigate this issue, we propose \textbf{Cluster-Learngene}, which effectively condenses knowledge from an ancestry model and then initializes descendant models with varying scales of attention heads. Specifically, our method adaptively clusters attention heads of each layer in the ancestry model based on their density characteristics and extracts centroids of attention heads as the learngene. Moreover, we introduce a priority weight-sharing strategy that expands the learngene to initialize descendant models with varying scales of attention heads. Through extensive experimentation, we demonstrate that Cluster-Learngene is not only more efficient compared to other initialization methods but also customizes models with varying scales of attention heads according to downstream task resources.
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 317
Loading