Acamar: A Dynamically Reconfigurable Scientific Computing Accelerator for Robust Convergence and Minimal Resource Underutilization
Abstract: Although modern supercomputers are capable of delivering Exaflops now, they do not always achieve their peak performance. For instance, even today's high-end supercom-puters achieve only less than 5% of their peak FLOPS when running HPCG, a benchmark designed to represent real-world scientific computing programs. To improve the efficiency of the key kernels in scientific computing, such as those used in solving partial differential equations, computer architects have begun to expand the applications of domain-specific architectures (DSAs) to scientific computing. However, DSAs that often have a fixed design are not likely to be practical solutions, as one specialized solution cannot fit all the diverse scientific computing workloads, making them less effective. The challenges of hardware inef-ficiency in today's supercomputers and the ineffectiveness of DSAs are further exacerbated by sparsity, a key characteristic of scientific computing workloads. While prior studies have proposed DSA solutions for sparse computations, they too are static and not adaptable to variations in the patterns and levels of sparsity across different scientific workloads. To address these challenges and target not only the diversity of computations in such workloads but also variations in sparsity, we propose Acamar11Acamar /’ rckomarr/ is a binary star system in the constellation of Eridanus., a dynamically reconfigurable accelerator. Acamar is adaptable to various solvers across different workloads and dynamically optimizes the trade-off between resource utilization and latency for sparse computations. The adaptable design also enables selecting a solver that guarantees convergence. We evaluate Acamar based on its Vitis HLS implementation on Xilinx Alveo u55c. Our experiments show a resource utilization and latency improvement up to 3.5 x and 6 x as well as improved performance efficiency and achieved throughput over a static design and Nvidia GTX 1650 Super.
Loading