An Optimizing Framework on MLIR for Efficient FPGA-based Accelerator Generation

Published: 2024, Last Modified: 07 Jan 2026HPCA 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: With the increasing demand for computing capability given limited resource and power budgets, it is prominent to deploy applications to customized accelerators like FPGAs. However, FPGA programming is non-trivial. Although existing high-level synthesis (HLS) tools improve productivity to a certain extent, they are limited in scope and capability to support sufficient FPGA-oriented transformations and optimizations. This paper focuses on FPGA-based accelerators and proposes POM, an end-to-end optimizing framework built on multi-level intermediate representation (MLIR). POM has several features which demonstrate its scope and capability of performance optimization. First, most HLS tools depend exclusively on a single-level IR like LLVM IR to perform all the optimizations, introducing excessive information into the IR and making debugging an arduous task. In contrast, POM explicitly introduces three layers of IR to perform operations at suitable abstraction levels, streamlining the implementation and debugging process and exhibiting better flexibility, extensibility, and systematicness. Second, POM integrates the polyhedral model into MLIR and hence enables advanced dependence analysis and a wide range of FPGA-oriented loop transformations. By representing nested loops with integer sets and maps at suitable IR, loop transformations can be conducted conveniently through a series of manipulations on polyhedral semantics. Finally, to further relieve design effort, POM is equipped with a user-friendly programming interface (DSL) that allows a concise description of computation and includes a rich collection of scheduling primitives. An automatic design space exploration (DSE) engine is also provided to search for high-performance optimization schemes efficiently and generate optimized accelerators automatically. Experimental results show that POM achieves a 6.46× average speedup on typical benchmark suites and a 6.06 ×average speedup on real-world applications compared to the state-of-the-art.
Loading