MASL-AFU: A High Memory Access Efficiency 2-D Scalable LUT-Based Activation Function Unit for On-Device DNN Training

Zhaoteng Meng, Lin Shu, Jianing Zeng, Zhan Li, Kailin Lv, Haoyue Yang, Jie Hao

Published: 01 Mar 2025, Last Modified: 06 Nov 2025IEEE Transactions on Very Large Scale Integration (VLSI) SystemsEveryoneRevisionsCC BY-SA 4.0

Abstract: On-device deep neural network (DNN) training faces constraints in storage capacity and energy supply. Existing works primarily focus on optimizing the training of convolutional and batch normalization (BN) layers to improve the compute-to-communication (CTC) ratio and reduce the energy cost of off-chip memory access (MA). However, the training of activation layers remains challenging due to the additional off-chip MA required for derivative calculations. This article proposes MASL-AFU, an architecture designed to accelerate the activation layer in on-device DNN training. MASL-AFU leverages nonuniform piecewise linear (NUPWL) functions to speed up the forward propagation (FP) in the activation layer. During the error propagation (EP) process, retrieving derivatives from a lookup table (LUT) eliminates the need for redundant retrieval of the input data used in FP. By storing LUT indices instead of the original activation inputs, MASL-AFU significantly reduces and accelerates MA. Compared to other activation function units, MASL-AFU offers up to a $5.8\times $ increase in computational and off-chip MA efficiency. In addition, MASL-AFU incorporates two dimensions of scalability: data precision and the number of LUT entries. These scalable, hardware-friendly methods enhance MASL-AFU’s area efficiency by up to $3.24\times $ and energy efficiency by up to $3.85\times $ .

External IDs:doi:10.1109/tvlsi.2024.3488782