Optimized Coding and Parameter Selection for Efficient FPGA Design of Attention Mechanisms

Published: 2025, Last Modified: 10 Nov 2025FCCM 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Efficient utilization of on-chip computational and memory resources, along with optimized high-level synthesis (HLS) coding, is vital to maximize parallelism and minimize latency. This paper demonstrates the HLS algorithms to achieve high utilization of processing elements to enhance parallelism. It also analyzes how various parameters of an attention layer impact latency, employs an efficient tiling technique, and explains the process of selecting an optimized tile size (TS).
Loading