Abstract: Binarized neural networks (BNNs) are one of the most promising approaches to deploy deep neural network models on resource-constrained devices. However, there is very little support on compilers and programmable accelerators for BNNs especially with the modern BNNs that use scale factors and skip connections to maximize network performance. In this paper we present a set of methods to extend a neural processing unit (NPU) and a compiler to support modern BNNs. Our novel ideas include (i) batch-norm folding for binarized layers with scale factors and skip connections, (ii) efficient handling of convolutions with few input channels, and (iii) bit-packing pipelining. Our evaluation using BiRealNet-18 on an FPGA board demonstrates that our compiler-architecture hybrid approach can yield significant speedups for binary convolution layers over the baseline NPU. Also our approach gives 3.6~5.5 $\times$ better end-to-end performance on BiRealNet-18 compared with previous BNN compiler approaches.
External IDs:dblp:conf/aspdac/SongAL24
Loading