Abstract: There are many applications scenarios for which the computational
performance and memory footprint of the prediction phase of Deep
Neural Networks (DNNs) need to be optimized. Binary Deep Neural
Networks (BDNNs) have been shown to be an effective way of achieving
this objective. In this paper, we show how Convolutional Neural
Networks (CNNs) can be implemented using binary
representations. Espresso is a compact, yet powerful
library written in C/CUDA that features all the functionalities
required for the forward propagation of CNNs, in a binary file less
than 400KB, without any external dependencies. Although it is mainly
designed to take advantage of massive GPU parallelism, Espresso also
provides an equivalent CPU implementation for CNNs. Espresso
provides special convolutional and dense layers for BCNNs,
leveraging bit-packing and bit-wise computations
for efficient execution. These techniques provide a speed-up of
matrix-multiplication routines, and at the same time, reduce memory
usage when storing parameters and activations. We experimentally
show that Espresso is significantly faster than existing
implementations of optimized binary neural networks (~ 2
orders of magnitude). Espresso is released under the Apache 2.0
license and is available at http://github.com/organization/project.
TL;DR: state-of-the-art computational performance implementation of binary neural networks
Keywords: binary deep neural networks, optimized implementation, bitwise computations
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/espresso-efficient-forward-propagation-for/code)
4 Replies
Loading