An Area-Efficient CNN Accelerator Supporting Global Average Pooling with Arbitrary Shapes

Published: 01 Jan 2024, Last Modified: 13 Nov 2024AICAS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Integrating dedicated convolution neural network (CNN) accelerators within the processing chips has been a common solution for efficient CNN inference in internet-of-thing (IoT) devices. Fully in-accelerator processing of different computational layers is essential to support a wide range of CNN models. However, previous works lack in-depth discussion for hardware implementation of global average pooling (GAP) layers, which are widely used in classification models. This paper proposes a novel CNN accelerator with high area efficiency for event-driven IoT applications. Fully in-accelerator processing is supported for popular CNN models, such as MobileNet V2 and ResNet34. GAP layers with arbitrary shapes are also supported by software-hardware co-design to enable the low-cost deployment of customized CNN models. Compared with the reference, the proposed design reduces the gate count of the pooling module by 45.1% and achieves a 13.2% area-efficiency improvement of the overall CNN accelerator with negligible accuracy loss.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview