Amber: A 367 GOPS, 538 GOPS/W 16nm SoC with a Coarse-Grained Reconfigurable Array for Flexible Acceleration of Dense Linear Algebra
Abstract: Amber is a system-on-chip (SoC) with a coarse-grained reconfigurable array (CGRA) for acceleration of dense linear algebra applications such as machine learning (ML), image processing, and computer vision. It achieves a peak energy efficiency of 538.0 INT16 GOPS/W and 483.3 BFloat16 GFLOPS/W. We maximize CGRA utilization and minimize reconfigurability overhead through (1) dynamic partial reconfiguration of the CGRA that enables higher resource utilization by allowing multiple applications to run at once, (2) efficient streaming memory controllers supporting affine access patterns, and (3) low-overhead transcendental and complex arithmetic operations. Compared to a CPU, a GPU, and an FPGA, Amber achieves up to 3902x, 152x, and 88x better energy-delay product (EDP).
Loading