Abstract: Although foundation models are powerful, they are large and require substantial memory and computation resources to serve. To address this issue, many pruning methods have been proposed to reduce model size, thereby achieving memory and computational efficiency. These methods _adjust the retained weights_ to compensate for the removed weights. In this paper, we propose a novel approach called input compensation (IC) to improve the performance of pruned models, i.e., _adjust the input_ to compensate for the removed weights. Unlike existing pruning methods, which are designed in the parameter space, the proposed IC is designed in the input space. Hence, IC is complementary to existing methods and can be integrated with them. Extensive experiments on various tasks, including image classification, language modeling, and image generation, demonstrate that IC effectively boosts the performance of pruned models.
Primary Area: General Machine Learning
Keywords: model pruning, model compression
Submission Number: 142
Loading