FastViT: Real-Time Linear Attention Accelerator for Dense Predictions of Vision Transformer (ViT)

Zhuoheng Ran, Zewen Ye, Chong Wu, Ray C. C. Cheung, Hong Yan

Published: 01 Jan 2025, Last Modified: 11 Oct 2025ISCAS 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The commercial success of generative artificial intelligence (GenAI) has driven an exponential surge in demand for real-time inference in Vision Transformer (ViT) applications, including latency-sensitive domains in autonomous driving, medical imaging and computational photography. This paper introduces FastViT, a high-performance and energy-efficient hardware accelerator for emerging kernel function-based linear attention mechanisms. By leveraging cost-efficient multiplication, mixed-precision quantisation and optimised data flow, FastViT improves real-time performance for high-resolution dense prediction tasks. Compared to existing approaches, experiments demonstrate that FastViT achieves higher throughput and energy efficiency while maintaining negligible accuracy degradation and balanced resource allocation. In the future, we will improve its scalability for next-generation hardware equipped with advanced DSP cores.