Abstract: Most dataflow accelerator compilers achieve high performance by mapping each node in a dataflow program to a dedicated hardware element on a dataflow accelerator. However, this approach misses critical data reuse optimizations required to exploit the data bandwidth from fine-grained memory elements, e.g., FIFOs and pipeline registers. Moreover, writing performant dataflow programs requires users to have domain expertise in the underlying dataflow accelerators. To address these issues, we designed Sigma, a novel compiler that supports high-level programming constructs such as Einstein summations, index notations, and tensors, finds opportunities for data reuse from high-level dataflow graphs, and exploits on-chip data bandwidth from fine-grained memory elements. Sigma targeting a research dataflow accelerator demonstrates a 5.4x speedup and 44.6x area-normalized speedup over Nvidia's V100 accelerator, and a 7.1x speedup over hand-written dataflow programs.
0 Replies
Loading