Keywords: visual foundation models, quantization, efficient adaptation, quantization-aware training
Abstract: Efficient strategies to jointly adapt and deploy large language models have seen a growing need under resource-limited conditions for downstream applications.
However, when applied to visual foundation models, existing methods typically incur either high GPU memory consumption during adaptation or extra computation costs introduced by the adapters at deployment.
In this paper, we propose **E**fficient **Qu**antization-aware **A**daptation (EQuA) that achieves high efficiency in both adaptation and deployment for visual foundation models.
We observe that dominant memory consumption arises from intermediate activations cached for backpropagation in the deep backbone and activation quantizers. To address this issue, we split a lightweight sub-network from the backbone during adaptation as a side adapter branch, and tailor two adaptation strategies to eliminate these cached activations, thereby significantly reducing memory consumption.
At deployment, the side adapter branch is merged back into the backbone, yielding a quantized model without any extra computation costs.
Extensive experiments on representative visual foundation models and diverse downstream tasks exhibit that EQuA achieves an elegant trade-off between performance and efficiency.
For example, EQuA yields over 70\% GPU memory reduction compared to state-of-the-art baselines while maintaining competitive performance.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 6116
Loading