Abstract: Modern image recognition models span millions of parameters occupying several megabytes and sometimes gigabytes of space, making it difficult to run on resource constrained edge hardware. Binary Neural Networks address this problem by reducing the memory requirements (one single bit per weight and/or activation). The computation requirement and power consumption are also reduced accordingly. Nevertheless, each neuron in such networks has a large number of inputs, making it difficult to implement them efficiently in binary hardware accelerators, especially LUT-based approaches.
In this work, we present a pruning algorithm and associated results on convolutional and dense layers from aforementioned binary networks. We reduce the computation by 4-70x and the memory by 190-2200x with less than 2% loss of accuracy on MNIST and less than 3% loss of accuracy on CIFAR-10 compared to full precision, fully connected equivalents. Compared to very recent work on pruning for binary networks, we still have a gain of 1% on the precision and up to 30% reduction in memory (526KiB vs 750KiB).
5 Replies
Loading