From Data to Model: A Survey of the Compression Lifecycle in MLLMs

Hao Wu, Junlong Tong, Xudong Wang, Yang Tan, Changyu Zeng, Anastasia Antsiferova, Xiaoyu Shen

Published: 27 Feb 2026, Last Modified: 17 Mar 2026CrossrefEveryoneRevisionsCC BY-SA 4.0
Abstract: Multimodal Large Language Models (MLLMs) have demonstrated exceptional proficiency in perception and reasoning, yet their deployment is often constrained by the substantial computational and memory overhead of long multimodal token sequences. While numerous compression techniques have been proposed, the existing approaches remain fragmented across pipeline stages, and the systemlevel connections among them are not yet clearly articulated. In this work, we present a unified perspective on the compression lifecycle of MLLMs, spanning the pipeline from raw data processing to language model inference. We organize compression methods according to their intervention points at the input, encoder, projector, and LLM levels. Across these levels, we distill five fundamental compression operations, namely dropping, aggregation, encoding, resampling, and skipping, which establish a consistent framework for analysis and facilitate an in-depth discussion of their underlying mechanisms. Furthermore, we discuss compression from the perspectives of system bottlenecks and multi-level composition, highlighting practical implications for selecting and combining techniques in efficient MLLM deployment. To support continuous updates and community tracking of the latest advances in this area, we maintain a public repository: https://github.com/EIT-NLP/Awesome-MLLM-Compression.
Loading