LRP-QViT: Mixed-Precision Vision Transformer Quantization Using Layer Importance Score

Published: 01 Jan 2025, Last Modified: 09 Nov 2025DSP 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: We introduce LRP-QViT, an explainability-driven approach for mixed-precision bit allocation in Vision Transformers (ViTs). Our method assigns different bit widths to layers based on their importance scores. To quantify each layer's contribution to classification, we employ Layer-wise Relevance Propagation (LRP), which identifies how much each layer contributes to the model's prediction. Using the LRP-based layer importance scores, we determine the optimal bit allocation while maintaining model size constraints using integer quadratic optimization. To ensure smooth precision transitions and prevent information bottlenecks, we incorporate bit-transition regularization. Additionally, we apply clipped channel-wise quantization to mitigate inter-channel variations by removing outliers from post- LayerNorm activations, improving quantization robustness. We validate LRP-QViT on ViT, DeiT, and Swin transformers across multiple datasets. Our results show that LRP-QViT with both fixed-bit and mixed-bit post-training quantization outperforms existing methods in 3-bit and 4-bit settings, demonstrating superior efficiency and accuracy.
Loading