Ternary Weight Decomposition and Binary Activation Encoding for Fast and Compact Neural Network

Mitsuru Ambai, Takuya Matsumoto, Takayoshi Yamashita, Hironobu Fujiyoshi

Nov 02, 2016 (modified: Nov 03, 2016) ICLR 2017 conference submission readers: everyone
  • Abstract: This paper aims to reduce test-time computational load of a deep neural network. Unlike previous methods which factorize a weight matrix into multiple real-valued matrices, our method factorizes both weights and activations into integer and noninteger components. In our method, the real-valued weight matrix is approximated by a multiplication of a ternary matrix and a real-valued co-efficient matrix. Since the ternary matrix consists of three integer values, {-1, 0, +1}, it only consumes 2 bits per element. At test-time, an activation vector that passed from a previous layer is also transformed into a weighted sum of binary vectors, {-1, +1}, which enables fast feed-forward propagation based on simple logical operations: AND, XOR, and bit count. This makes it easier to deploy a deep network on low-power CPUs or to design specialized hardware. In our experiments, we tested our method on three different networks: a CNN for handwritten digits, VGG-16 model for ImageNet classification, and VGG-Face for large-scale face recognition. In particular, when we applied our method to three fully connected layers in the VGG-16, 15x acceleration and memory compression up to 5.2% were achieved with only a 1.43% increase in the top-5 error. Our experiments also revealed that compressing convolutional layers can accelerate inference of the entire network in exchange of slight increase in error.
  • Keywords: Deep learning
  • Conflicts: chubu.ac.jp, d-itlab.co.jp, denso.co.jp, kumamoto-u.ac.jp