- Abstract: Traditionally, researchers thought that high-precision weights were crucial for training neural networks with gradient descent. However, recent research has obtained a finer understanding of the role of precision in neural network weights. One can train a NN with binary weights and activations at train time by augmenting the weights with a high-precision continuous latent variable that accumulates small changes from stochastic gradient descent. However, there is a dearth of theoretical analysis to explain why we can effectively capture the features in our data with binary weights and activations. Our main result is that the neural networks with binary weights and activations trained using the Courbariaux, Hubara et al. (2016) method work because of the high-dimensional geometry of binary vectors. In particular, the continuous vectors that extract out features in these BNNs are well-approximated by binary vectors in the sense that dot products are approximately preserved. Compared to previous research that demonstrated the viability of such BNNs, our work explains why these BNNs work in terms of the geometry of high-dimensional binary vectors. Our theory serves as a foundation for understanding not only BNNs but networks that make use of low precision weights and activations. Furthermore, a better understanding of multilayer binary neural networks serves as a starting point for generalizing BNNs to other neural network architectures such as recurrent neural networks.
- TL;DR: Theory and experiments explaining how Binary Neural Networks work based on the geometry of high-dimensional binary vectors
- Conflicts: berkeley.edu
- Keywords: Theory, Deep learning