Generalization by Specialization: Unveiling Specialized Subnetworks in Large Language Models

25 Sept 2024 (modified: 25 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM; Subnetworks; Generalization
Abstract: In recent years, large language models (LLMs) have exhibited remarkable generalization capabilities. Previous studies have largely focused on examining the generalization mechanisms in smaller models to draw inferences about similar mechanisms in larger language models. However, these smaller models typically possess limited generalization capacity. In this study, we explore the generalization mechanisms of billion-parameter language models, with a particular attention on publicly available models such as LLaMA and Gemma. Our findings reveal that weight activations exhibit task-specific behavior, indicating that not all weights are necessary for task performance. Building on this insight, we introduce a parameter probing method to identify subnetworks optimized for specific tasks without extensive fine-tuning. This method involves sorting and grouping weight activations followed by the pruning of less significant groups based on a small validation set. Furthermore, our results show that subnetworks specialized for domain-specific tasks achieve improved performance and generalization within their respective domains, but their performance deteriorates across different domains. This study presents a novel perspective on generalization of LLMs where the strength of large language models lies in their multiplicity of domain-specific subnetworks, allowing them to excel in various in-domain tasks.
Primary Area: interpretability and explainable AI
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4206
Loading