An On-Chip-Training Keyword-Spotting Chip Using Interleaved Pipeline and Computation-in-Memory Cluster in 28-nm CMOS

Junyi Qian, Cai Li, Long Chen, Ruidong Li, Tuo Li, Peng Cao, Xin Si, Weiwei Shan

Published: 01 Jan 2025, Last Modified: 09 Nov 2025IEEE Trans. Very Large Scale Integr. Syst. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: To improve the precision of keyword spotting (KWS) for individual users on edge devices, we propose an on-chip-training KWS (OCT-KWS) chip for private data protection while also achieving ultralow -power inference. Our main contributions are: 1) identity interchange and interleaved pipeline methods during backpropagation (BP), enabling the pipelined execution of operations that traditionally had to be performed sequentially, reducing cache requirements for loss values by 95.8%; 2) all-digital isolated-bitline (BL)-based computation-in-memory (CIM) macro, eliminating ineffective computations caused by glitches, achieving $2.03\times $ higher energy efficiency; and 3) multisize CIM cluster-based BP data flow, designing each CIM macro collaboratively to achieve all-time full utilization, reducing 47.2% of output feature map (Ofmap) access. Fabricated in 28-nm CMOS and enhanced with a refined library characterization methodology, this chip achieves both the highest training energy efficiency of 101.5 TOPS/W and the lowest inference energy of 9.9nJ/decision among current KWS chips. By retraining a three-class depthwise-separable convolutional neural network (DSCNN), detection accuracy on the private dataset increases from 80.8% to 98.9%.

External IDs:dblp:journals/tvlsi/QianLCLLCSS25