Optimizing Deep Neural Network Precision for Processing-in-Memory: A Memory Bottleneck Perspective

Inseong Hwang; Jihoon Jang; Hyun Kim

Optimizing Deep Neural Network Precision for Processing-in-Memory: A Memory Bottleneck Perspective

Inseong Hwang, Jihoon Jang, Hyun Kim

Published: 01 Jan 2025, Last Modified: 19 May 2025ICEIC 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This paper analyzes the detailed measurement of memory bottlenecks in processing-in-memory (PIM) systems on deep neural networks (DNNs) with two precisions (INT8/FP32) by utilizing the memory bottleneck metrics. The impact of INT8, a helpful data movement efficiency improvement on DNN, was examined to determine which precision is more optimal for a PIM system. The results demonstrate that INT8 alleviates the overall memory bottleneck, while LLC MPKI of Softmax with high computational complexity increases from 3.459 to 16.725 and LFMR of the FC layer decreases from 99.795% to 99.483%, but it is hard to expect considerable improvement. For this reason, processing the Softmax and FC layers in PIM when targeting INT8 DNN models is anticipated to enhance performance significantly.

Loading