Enabling Highly Efficient Capsule Networks Processing Through Software-Hardware Co-Design

Xingyao Zhang, Xin Fu, Donglin Zhuang, Chenhao Xie, Shuaiwen Leon Song

2021 (modified: 10 Jun 2022)IEEE Trans. Computers 2021Readers: Everyone

Abstract: As the demand for the image processing increases, the image features become increasingly complicated. Although the Convolutional Neural Network (CNN) have been widely adopted for the imaging processing tasks, it has been found easily misled due to the massive usage of pooling operations. A novel neural network structure called Capsule Networks (CapsNet) is proposed to address the CNN challenge and essentially enhance the learning ability for the image segmentation and object detection. Since the CapsNet contains the high volume of the matrix execution, it has been generally accelerated on modern GPU platforms with the highly optimized deep-learning library. However, the routing procedure of CapsNet introduces the special program and execution features,including massive unshareable intermediate variables and intensive synchronizations, causing inefficient CapsNet execution on modern GPU. To address these challenges, we propose the software-hardware co-designed optimizations, SH-CapsNet, which includes the software-level optimizations named <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">S-CapsNet</i> and a hybrid computing architecture design named <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">PIM-CapsNet</i> . In software-level, S-CapsNet reduces the computation and memory accesses by exploiting the computational redundancy and data similarity of the routing procedure. In hardware-level, the PIM-CapsNet leverages the processing-in-memory capability of today's 3D stacked memory to conduct the off-chip in-memory acceleration solution for the routing procedure, while pipelining with the GPU's on-chip computing capability for accelerating CNN types of layers in CapsNet. Evaluation results demonstrate that either our software or hardware optimizations can significantly improve the CapsNet execution efficiency. Together, our co-design can achieve greatly improvement on both performance (3.41 x) and energy savings (68.72 percent) for CapsNet inference, with negligible accuracy loss.

0 Replies