The BRAM is the Limit: Shattering Myths, Shaping Standards, and Building Scalable PIM Accelerators

Published: 2024, Last Modified: 10 Nov 2025FCCM 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Many recent FPGA-based Processor-in-Memory (PIM) architectures have appeared with promises of impressive levels of parallelism but with performance that falls short of expectations due to reduced maximum clock frequencies, an inability to scale processing elements up to the maximum BRAM capacity, and minimal hardware support for large reduction operations. In this paper, we propose a “Standard” set of design objectives for PIM array-based FPGA designs. We then propose a PIM array-based GEMV accelerator architecture as a case study to show the proposed Standard can be realized in practice. The GEMV accelerator serves as existence proof that dispels several myths surrounding what is normally accepted as clocking and scaling FPGA performance limitations. Specifically, the proposed accelerator clocks at the maximum frequency of the BRAM and scales to 100% of the available BRAMs. Comparative analyses show execution speeds over existing PIM-based GEMV engines on FPGAs and achieving a 2.65Χ – 3.2Χ faster clock. An AMD Alveo U55 implementation achieves a system clock speed of 737 MHz, providing 64K bit serial multiply-accumulate (MAC) units for GEMV operation.
Loading