Data-Free Quantization of Vision Transformers Through Perturbation-Aware Image Synthesis

Published: 01 Jan 2024, Last Modified: 02 Aug 2025PRICAI (3) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Vision Transformers (ViT) have demonstrated outstanding performance in visual tasks. However, deploying and inferring ViT models on resource-constrained edge devices face challenges due to their high computational overhead. Existing quantization methods require access to raw training data, which raises security and privacy concerns. To address this issue, this paper proposes a data-free quantization method named Perturbation-Aware Vision Transformer (PA-ViT), which effectively enhances the robustness of synthetic images, thereby improving the performance of downstream post-training quantization tasks. Specifically, PA-ViT introduces perturbations to the synthetic images, and then models the inconsistency between the attention maps and predicted labels of both perturbed and unperturbed images, as processed by the full-precision (FP) model. A loss function is constructed to guide the generation of robust images. Experimental results on ImageNet demonstrate significant performance improvements compared to existing techniques and even surpass quantization using real data. For instance, PA-ViT with Swin-T as the backbone model achieves a 5.29% and 4.93% improvement in top-1 accuracy compared to the state-of-the-art model when quantized to 4-bit and 8-bit precision, respectively, providing an excellent solution for data-free post-training quantization of vision transformers.
Loading