Abstract: Efficient utilization of on-chip computational and memory resources, along with optimized high-level synthesis (HLS) coding, is vital to maximize parallelism and minimize latency. This paper demonstrates the HLS algorithms to achieve high utilization of processing elements to enhance parallelism. It also analyzes how various parameters of an attention layer impact latency, employs an efficient tiling technique, and explains the process of selecting an optimized tile size (TS).
External IDs:dblp:conf/fccm/KabirDB0H25
Loading