A Stacked FPGA utilizing 3D-SRAM with Latency Optimization

Ryo Takahashi, Kota Ando, Hiroki Nakahara

Published: 01 Jan 2024, Last Modified: 12 Jun 2025MCSoC 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: AI technology Is rapidly evolving, often leading to significant changes in model architectures due to ongoing research and development. When designing AI hardware accelerators, flexibility is crucial to accommodate potential future changes in AI model structures. FPGAs, increasingly utilized as AI accelerators, perform logical representations using LUTs and are characterized by their high flexibility. Executing computations via LUTs rather than dedicated circuits is important to achieve this flexibility. Generally, increasing the number of inputs to a LUT reduces FPGA latency but increases area. The use of 3D-SRAM, which is gaining practical application, may allow for an increase in LUT inputs without a significant area increase, potentially enhancing FPGA performance. However, 3D-SRAM has higher latency compared to the SRAM traditionally used in LUTs. As a result, the total latency reduction from increasing the number of LUT inputs may be negated. In this study, we developed a simulator for an FPGA equipped with large-input LUTs using 3D-SRAM and conducted performance comparison experiments with conventional FPGAs.