Meta-Sparsity: Learning Optimal Sparse Structures in Multitask Networks through Meta-learning

TMLR Paper3771 Authors

27 Nov 2024 (modified: 11 Mar 2025)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: This paper presents meta-sparsity, a framework for learning model sparsity, basically learning the parameter that controls the degree of sparsity, that allows deep neural networks (DNNs) to inherently generate optimal sparse shared structures in multi-task learning (MTL) setting. This proposed approach enables the dynamic learning of sparsity patterns across a variety of tasks, unlike traditional sparsity methods that rely heavily on manual hyperparameter tuning. Inspired by Model Agnostic Meta-Learning (MAML), the emphasis is on learning shared and optimally sparse parameters in multi-task scenarios by implementing a penalty-based, channel-wise structured sparsity during the meta-training phase. This method improves the model’s efficacy by removing unnecessary parameters and enhances its ability to handle both seen and previously unseen tasks. The effectiveness of meta-sparsity is rigorously evaluated by extensive experiments on two datasets, NYU-v2 and CelebAMask-HQ, covering a broad spectrum of tasks ranging from pixel-level to image-level predictions. The results show that the proposed approach performs well across many tasks, indicating its potential as a versatile tool for creating efficient and adaptable sparse neural networks. This work, therefore, presents an approach towards learning sparsity, contributing to the efforts in the field of sparse neural networks and suggesting new directions for research towards parsimonious models.
Submission Length: Long submission (more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=tT0gXgiPU5&referrer=%5BAuthor%20Console%5D(%2Fgroup%3Fid%3DTMLR%2FAuthors%23your-submissions)
Changes Since Last Submission: The reviewers and the action editor raised the following three comments in the previous submission- 1. Insufficient baseline comparison, particularly with a meta-learning baseline, to observe the effect of sparsity and meta-learning. 2. Theoretical justification of how meta-learning enhances task generalization in this work. 3. Clarification and discussion on performance stability The following updates have been made in the manuscript to address the above-mentioned comments- 1. We made changes in the introduction section in the contributions to soften the claims of generalization. 2. The theoretical justification is added in Section 3, which describes the theory behind generalization w.r.t. all the critical components of this work and lays the theoretical framework behind the generalization theory of the proposed meta-sparsity approach. 3. The MTL+meta learning, i.e., meta-learning baseline experiments, are added in Section 4, which describes the experimental setup. The outcomes of meta-learning baseline experiments are added in Table 3 and 4 for performance comparision with meta-sparsity, for two meta-testing settings: * on addition of new tasks. * on the meta-trained tasks. 4. In Section 5, three new subsections are added that discuss: * Performance stability. * Stability in percentage sparsity across experiments and datasets. * Comparison and discussion of the meta-learning baseline experiments to meta-sparsity. Other than these, minor changes were made in the manuscript related to grammar and spelling corrections. In response to the feedback provided by the reviewers and the action editor, we have incorporated all the recommended suggestions to enhance the quality of our work. We thank again the reviewers and the action editor for helping in improving the manuscript and suggesting a resubmission after incorporating the comments.
Assigned Action Editor: ~Han_Zhao1
Submission Number: 3771
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview