Keywords: model pruning, model compression
Abstract: Though foundation models are powerful, they are large and require substantial memory and computation resources for serving.
To tackle this issue, many pruning methods have been proposed to reduce the model size, thereby achieving memory and computational efficiency. These methods either identify and retrain the important weights or \textit{adjust the unpruned weights} to compensate for the removed weights. In this paper, we propose a novel approach called input compensation (IC) to boost the performance of pruned models, i.e., \textit{adjust the input} to compensate for the removed weights. We learn a compensation pool to construct input-dependent compensation to reduce the error caused by pruning. Different from existing pruning methods, which are designed in the parameter space, the proposed IC is designed in the input space. Hence, IC is complementary to existing methods and can be integrated with them.
Extensive experiments on various tasks, including image classification, language modeling, and image generation, demonstrate that IC is effective in improving the performance of pruned models.
Primary Area: foundation or frontier models, including LLMs
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6925
Loading