Abstract: Memory sub-system including the data memory and instruction memory in a typical CGRA often takes up considerable chip area and even dominates the performance of CGRAs. Observing the common phenomenon that instruction memory in CGRAs is highly under-utilized while the data memory is over-committed or vice versa, we propose a CGRA with unified memory architecture, UM-CGRA, to enable flexible on-chip memory sharing between data and instructions. Furthermore, PEs are also augmented to share data between neighbors working in parallel. Also, an on-chip memory sharing-aware mapping algorithm is developed to unleash the potential of the proposed architecture. Our experimental results show that UM-CGRA achieves 77% performance improvement on average over the baseline CGRA given the same amount of total on-chip memory. When setting the same performance goal, UM-CGRA achieves 10.7% chip area-saving and 28.6% energy efficiency improvement on average.
Loading