Abstract: Matrix-matrix multiplication (MM) of large matrices plays a crucial role in various applications, including machine learning. MM requires significant computational resources, but accessing memory can quickly become the bottleneck. Field-Programmable Gate Arrays (FPGAs) offer a range of memory options, such as Block RAM (BRAM), UltraRAM (DRAM), and High Bandwidth Memory (HBM), each with unique characteristics. In this study, we explore the optimal combination of HBM with either BRAM, URAM, or both, depending on the size of the input data. We employ Bayesian optimization to optimize the FPGA implementation, analyze the tradeoffs between different memory types, and determine the most suitable memory allocation based on memory sizes. Our findings provide insights for designers seeking to optimize their designs, and demonstrate that URAM outperforms a combination of BRAM and URAM when data fits in URAM. Overall, our approach enables more efficient memory allocation for larger matrix sizes on FPGAs compared to prior research.
Loading