Abstract: Reconfigurable constant coefficient multipliers are promising for enhancing energy efficiency and saving hardware costs (i.e., FPGA LUTs or DSPs) for CNN acceleration. Unfortunately, they may incur an area overhead in ASIC due to multiple adders. To address this problem, this study proposes a reconfigurable one-adder multiplication design that maps CNN parameters into only 35 possible eight-bit coefficient representations. In addition, we introduce a training framework to recover the accuracy of a quantized model using the proposed coefficient set. Experimental results using MobileNet V2 demonstrate that, compared to an eight-bit generic PE design, our approach reduces the area and power consumption by 24.4% and 35.1%, respectively, and only slightly degrades accuracy, i.e., by 0.58%, on the ImageNet dataset.
Loading