Q-PIM: A Genetic Algorithm based Flexible DNN Quantization Method and Application to Processing-In-Memory Platform

Yun Long, Edward Lee, Daehyun Kim, Saibal Mukhopadhyay

Published: 2020, Last Modified: 07 Mar 2025DAC 2020EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This paper presents a genetic algorithm (GA) based training free layer-wise quantization method, named as GAQ, to reduce model complexity of arbitrary DNN architectures. The proposed algorithm formulates an optimization problem to determine the quantization level for each DNN layer under the constrain of maximum accuracy degradation and uses genetic algorithm to solve the problem at the inference stage of any pre-trained DNN models. The experimental results on various DNNs for image classification demonstrate 5x to 17x weight compression rate with insignificant (< 2%) accuracy loss, comparable with existing quantization algorithms which typically require multi-pass retraining and handcrafted tuning. To evaluate the computational benefits of GAQ, we present a SRAM based flexible precision all-digital processing-in-memory (PIM) architecture, named as Q-PIM, that leverages GAQ to optimally control precision for each DNN layer to enhance efficiency. The simulation in 28nm CMOS shows potential for significant energy and latency advantage over fixed-precision PIM architectures.