SlimLLaVA: Automatic Pruning for Large Vision-language Models

27 Sept 2024 (modified: 13 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Prune, Large vision lanugage model, Generalization
TL;DR: The first work on pruning large vision language models maintaining generalization ability
Abstract: Multimodal large language models achieve satisfying performance in complex reasoning tasks, while still suffers from high model complexity in deployment especially for resource-limited devices. In this paper, we propose an automatic pruning method of large vision-language models for efficient multimodal reasoning. Conventional methods leverage the training data of the original model to select the proper pruning ratio for different network components, while they are infeasible for large vision-language models due to the unbearable search cost caused by web-scale training corpus. On the contrary, we only use a few samples to search the desired pruning policy by maximizing its generalization ability on the unknown training data despite of the model accuracy, so that the optimal accuracy-efficiency trade-off can be obtained for large vision-language models. Specifically, we formulate the generalization gap for the pruning policy based on the structural risk minimization principle. With the task performance and the generalization ability, we iteratively search for the optimal pruning policy in the given search space and optimize the vision projector to evolve the search space with higher upper bound of performance. We conduct extensive experiments on ScienceQA, Vizwiz, MM-vet and LLaVA-Bench datasets for the task of visual question answering. With only 64 samples for pruning policy search, our method achieves 83.05\% accuracy on ScienceQA and $\times$1.47 speedup compared to the dense LLaVA-v1.5-7B model.
Primary Area: foundation or frontier models, including LLMs
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 8773
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview