QuS: Towards High-Performance EfficientViT on FPGA by  Quantization and Streamline Co-Design

Yejun Zeng; Yifu Ding; Jinyang Guo; Haotong Qin; Yi Xu; Na Li; Yufei Guo; Xianglong Liu

QuS: Towards High-Performance EfficientViT on FPGA by Quantization and Streamline Co-Design

Yejun Zeng, Yifu Ding, Jinyang Guo, Haotong Qin, Yi Xu, Na Li, Yufei Guo, Xianglong Liu

11 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Quantization, ViT, Efficient, Accelerator

Abstract: Vision Transformer (ViT) has achieved significant success in computer vision, in which EfficientViT is widely used because of its lightweight characteristics. However, EfficientViT is still difficult to deploy on edge devices like FPGA because of its efficiency and accuracy concerns. First, from software perspective, existing quantization approaches fail to consider the inter-channel distribution relationship, which cause significant performance degradation under lower-bit setting. Second, from hardware perspective, current DSP-packing methods struggle to support the diverse kernel sizes and strides of convolutions used in EfficientViT, resulting in redundant computation cycles or bit-width overflow. Moreover, due to the mismatch in data layouts between convolution and linear attention, existing solutions require substantial memory resources for data reordering, which often results in pipeline stalling. In this paper, we propose a Quantization and Streamline Co-Design (QuS) framework for lower-bit EfficientViT deployment on FPGA. It includes three main components: adaptive distribution-aware quantization strategy to provide effective quantization, multi-computing in once packing strategy to improve the DSP-packing efficiency, and low-buffer streamline for linear attention scheme to eliminate pipeline stalling caused by mismatched layout. Experimental results show that our QuS framework achieves over 2200 FPS on EfficientViT, which represents a $3.6\times$ speedup over Jetson AGX Orin and also up to a $24\%$ accuracy improvement under 4-bit quantization.

Supplementary Material: zip

Primary Area: infrastructure, software libraries, hardware, systems, etc.

Submission Number: 3865

Loading