Abstract: High-level synthesis (HLS) automatically transforms high-level programs in a language such as C/C++ into a low-level hardware description. In this context, loop pipelining is a key optimisation method for improving hardware performance. The main performance bottleneck of a pipelined loop is the ratio between two values: the latency of each iteration and the dependence distance of the operations in the loop. These two values are usually not known exactly, so existing HLS schedulers model them independently, which can cause sub-optimal performance. This paper extends state-of-the-art static schedulers with a fully automated pass that exposes and takes advantage of potential correlation between these two values, enabling smaller initiation intervals (II). We use the Microsoft Boogie software verifier to prove the existence of these correlations, which allows HLS tools to automatically find a high-performance hardware solution while maintaining correctness. Our results show that for a certain class of programs, our approach achieves, on average, an $11.1\times$ performance gain at the cost of a 95% area overhead.
0 Replies
Loading