A CNN Inference micro-benchmark for Performance Analysis and Optimization on GPUs

Jurn-Gyu Park, Zhumakhan Nazir, Beknur Kalmakhanbet, Saidgaffor Sabyrov

Published: 2022, Last Modified: 10 Nov 2025SMC 2022EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: optimization of per-layer or/and total inference time without accuracy loss in convolutional neural networks (CNNs) is significantly crucial in resource-constrained Edge-AI devices and embedded systems. To do this, this work 1) introduces a CNN inference micro-benchmark (mbNet) for performance analysis and optimization and 2) proposes a simple yet effective performance model for adaptive kernel selection to optimize per-layer CNN inference time. Considering the convolutional layer is the core part of CNNs, the two mainstream convolutional strategies of unrolling based convolution (UNROLL) and direct convolution (DIRECT) are adopted/implemented, compared and analyzed in terms of per-layer convolutional time. Using the obtained data from our mbNet, we build an accurate and interpretable tree-based performance model, with which our adaptive kernel selection approach shows significant convolutional performance improvement up to $5.4\times$ speedup (on average, $2.7\times$ and $1.6\times$ speedup, compared to the default UNROLL and DIRECT respectively).

External IDs:dblp:conf/smc/ParkNKS22