Abstract: In this paper, we present a flexible and scalable architectural template for designing domain-specific FPGA overlays. The template can be parametrized in terms of number and size of tiles, communication topology and connectivity to external RAM. In addition to direct streaming connections between adjacent tiles for bulk data, it uses a novel lightweight packet- based on-chip network for small data transfers. The tiles consist of configurable routing resources and the actual compute units, which are exchanged at runtime by means of dynamic partial reconfiguration. This allows the assembly of custom, varying data paths for processing data flow graphs in hardware. Since every connection within and at the edge of the template is AXI-Stream- compliant, it is highly extensible using standard IP cores and streaming-based High-Level Synthesis code, enabling plug-and- play deployment. The feasibility of our approach is demonstrated by means of an domain-specific overlay for analytical database query processing on a Xilinx Alveo U280 card. TPC-H queries and synthetic workloads are used as test cases. Insights from this demonstration system show that the resulting overlay does not affect the performance of the compute units and show the effectiveness of the lightweight packet-based on-chip network in saving routing resources.
Loading