Optimizing Data Reuse for Loop Mapping on CGRAs With Joint Affine and Nonaffine Transformations

Published: 2025, Last Modified: 28 Jan 2026IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Coarse-grained reconfigurable arrays (CGRAs) can provide high energy efficiency while maintaining flexibility, which is promising to keep pace with the power requirements and the frequent updates of applicants. With flexible register chains, modern CGRAs enable data reuse within the processing element array (PEA) to reduce on-chip memory accesses and improve pipelining performance. However, existing works pay little attention to comprehensive loop transformations, such as affine and nonaffine transformation, to obtain a data reuse-friendly loop structure. Therefore, this article proposes a data-reuse-friendly loop mapping approach using joint affine and nonaffine transformations. With affine transformations, the distance of loop dependencies could be reduced and then handled by in-PEA routes. With nonaffine transformations (i.e., loop unrolling), small loop kernels could be unrolled and expose more memory accesses for data reuse. To efficiently solve the loop transformation problem, we first establish a reduced polyhedral formulation and then propose a divide-and-conquer-based solution to find optimized transformations with moderate compilation time. Experimental results demonstrate that our approach can achieve a speedup up to $1.74\times $ compared to state-of-the-art methods.
Loading