NeuroBlend: Towards Low-Power yet Accurate Neural Network-Based Inference Engine Blending Binary and Fixed-Point Convolutions

Arash Fayyazi, Mahdi Nazemi, Arya Fayyazi, Massoud Pedram

Published: 01 Jan 2024, Last Modified: 17 Feb 2025ACM Great Lakes Symposium on VLSI 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This paper introduces NeuroBlend, a novel neural network architecture featuring a unique building block known as the Blend module. This module incorporates binary and fixed-point convolutions in its main and skip paths, respectively. There is a judicious deployment of batch normalizations on both main and skip paths inside the Blend module and in between consecutive Blend modules. Additionally, we present a compiler and hardware architecture designed to map NeuroBlend models onto FPGA devices, aiming to minimize inference latency while maintaining high accuracy. Our NeuroBlend-20 (NeuroBlend-18) model, derived from ResNet-20 (ResNet-18) trained on CIFAR-10 (CIFAR-100), achieves 88.0% (73.73%) classification accuracy, outperforming state-of-the-art binary neural networks by 0.8% (1.33%), with an inference time of 0.38ms per image, 1.4x faster than previous FPGA implementation for BNNs. Similarly, our BlendMixer model for CIFAR-10 attains 90.6% accuracy(1.59% less than full precision MLPMixer), with a 3.5x reduction in model size compared to full precision MLPMixer. Furthermore, leveraging DSP blocks for 48-bit bitwise logic operations enables low-power FPGA implementation, yielding a 2.5x reduction in power consumption.