BooLUT-CIM: A 52.4 TOPS/W Radix-4 Booth-LUT Digital CIM With Negative-Magnitude-Bits Inversion Storage

Yi Yang, Xiao Tan, Jinwu Chen, Yucheng Du, Tianhui Jiao, Xing Wang, An Guo, Xin Si

Published: 2025, Last Modified: 16 Jan 2026IEEE Trans. Circuits Syst. II Express Briefs 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Compute-in-Memory (CIM) is an effective approach to boost the efficiency of AI hardware by reducing data movement. Digital CIM (DCIM) has garnered extensive attention owing to its immunity to PVT variations and high precision. Recently, a lookup table (LUT)-based DCIM architecture was proposed, which stores the precomputed summations of adjacent weights in memory, thereby alleviating the substantial overhead of adder trees in traditional DCIM designs. However, the efficiency of LUT-based DCIM is difficult to further improve due to the bit-serial input method and the power-hungry LUT readout operations. To address this challenge, this brief presents a BooLUT-CIM design that integrates the advantages of LUT-based CIM and Booth algorithm, reducing the quantity of lookup and addition operations by a half. A LUT compression method derived from computation-friendly numerical relationships is utilized to reduce circuit overhead. Additionally, a negative magnitude-bits inversion storage scheme is proposed to reduce inefficient discharge, saving 38% of LUT array readout power on average. A prototype of this design is implemented in 28nm CMOS technology for validation. Simulation results show that the macro achieves an energy efficiency of 52.4 TOPS/W at 0.9V, 400 MHz, which is $1.3{\times }$ higher than prior arts.

External IDs:dblp:journals/tcasII/YangTCDJWGS25