Abstract: Lookaside Accelerators (LAAs) present a compelling approach to enhancing system performance by offloading computational workloads from general-purpose CPUs, offering substantial potential for next-generation accelerated computing. However, the effectiveness of LAAs is heavily dependent on system architecture, particularly in mitigating communication overhead and inefficiencies that can hinder anticipated performance gains. This study investigates the key factors contributing to communication overhead in LAA-enabled architectures, analyzing their collective impact on performance. Specifically, we evaluate critical architectural considerations, including cache hierarchy, data-sharing mechanisms, and communication strategies such as interrupt-driven versus polling-based interactions.To assess these factors, we implement an LAA using an emulated design on an FPGA integrated with ARM CPUs within an AMD MPSoC platform. Through extensive evaluation, we analyze the performance implications of varying data sizes offloaded from an embedded ARM core to the accelerator. Our findings highlight the significant impact of data sizes of workload tasks, the influence of shared memory access patterns, and the trade-offs associated with different design considerations on the communication performance of accelerators. Additionally, this research characterizes communication overhead structures and provides optimization insights for accelerator design, ultimately targeting to enable higher performance in SoCs and MPSoCs with embedded acceleration technologies.
External IDs:dblp:conf/socc/BolatSMHK25
Loading