An Area-Efficient CNN Accelerator Supporting Global Average Pooling with Arbitrary Shapes

Yichuan Bai; Xiaopeng Zhang; Qian Wang; Jingjing Lv; Lei Chen; Yuan Du; Li Du

An Area-Efficient CNN Accelerator Supporting Global Average Pooling with Arbitrary Shapes

Yichuan Bai, Xiaopeng Zhang, Qian Wang, Jingjing Lv, Lei Chen, Yuan Du, Li Du

Published: 01 Jan 2024, Last Modified: 13 Nov 2024AICAS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Integrating dedicated convolution neural network (CNN) accelerators within the processing chips has been a common solution for efficient CNN inference in internet-of-thing (IoT) devices. Fully in-accelerator processing of different computational layers is essential to support a wide range of CNN models. However, previous works lack in-depth discussion for hardware implementation of global average pooling (GAP) layers, which are widely used in classification models. This paper proposes a novel CNN accelerator with high area efficiency for event-driven IoT applications. Fully in-accelerator processing is supported for popular CNN models, such as MobileNet V2 and ResNet34. GAP layers with arbitrary shapes are also supported by software-hardware co-design to enable the low-cost deployment of customized CNN models. Compared with the reference, the proposed design reduces the gate count of the pooling module by 45.1% and achieves a 13.2% area-efficiency improvement of the overall CNN accelerator with negligible accuracy loss.

Loading