Keywords: Vision-Language-Action, Quantization, Model Compression & Optimization
TL;DR: Modality-aware Block Rotation enables stable 4-bit quantization for OpenVLA by preserving modality-specific structure that global rotation destroys.
Abstract: Vision-Language-Action (VLA) models enable unified perception, reasoning, and control, but their deployment is constrained by the large memory cost.
Although post-training quantization (PTQ) is a promising solution, existing rotation-based methods fail under 4-bit quantization.
We attribute this failure to cross-modal heterogeneity, where vision and language tokens share the same layers but exhibit highly heterogeneous activation statistics, resulting in severe mismatch in both activation scaling and Hessian structure.
This mismatch fundamentally breaks the assumptions behind existing rotation-based and Hessian-aware quantization methods.
We propose \emph{Modality-Aware Block Rotation} (MABR), which preserves modality-specific channel structure by restricting rotation within modality-consistent groups.
This prevents the diffusion of dominant language activations into vision channels and enables stable low-bit quantization.
On OpenVLA-7B, MABR substantially bridges the gap to full-precision performance and remains stable where naive 4-bit quantization collapses, incurring only a 3.0\% performance drop without any fine-tuning.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 123
Loading