Abstract: In recent years, we observe active growth in designing application-specific architectures to accelerate the Convolutional Neural Network (CNN). Among CNN architectures, the recently introduced EfficientNet has emerged as the state-of-the-art CNN, which presents an extensible compound scaling architecture to enhance network capacity to achieve higher accuracy with relatively lower computation demand. However, we see a lack of application-specific architecture support to capitalize on the nuances and benefits of EfficientNet fully. This paper presents Tufan, a throughput-oriented architecture for the acceleration of EfficientNet on Cloud FPGAs. Tufan is a unified framework that supports various EfficientNet family architectures, demonstrating structural sparsity. The accelerator design introduces parameterizable, configurable, and scalable compute units that can be configured based on the user-specific requirement, EfficientNet model, and batch size. We assess the energy efficiency of Tufan for executing a set of EfficientNet family configurations implemented on Xilinx’s Alveo U50 FPGA board and Nvidia Tesla P100 GPU. Our experimental results confirmed that Tufan enhances energy efficiency by 7.81% over P100 GPGPU for a batch size of 28 at 300MHz.
0 Replies
Loading