Training wide residual networks for deployment using a single bit for each weight

Anonymous

Nov 03, 2017 (modified: Nov 03, 2017) ICLR 2018 Conference Blind Submission readers: everyone Show Bibtex
  • Abstract: For fast and energy-efficient deployment of trained deep neural networks on resource-constrained embedded hardware, each learnt weight parameter should ideally be represented and stored using a single bit. Error-rates usually increase when this requirement is imposed. Here, we report methodological innovations that result in large reductions in error rates across multiple datasets for deep convolutional neural networks deployed using a single bit for each weight. The main contribution is to replace learnt scaling factors applied to the sign of weights in training, by a constant scaling factor that reflects a common initialisation method. For models with 1-bit per weight, and 20 convolutional layers, requiring only ~4 MB of parameter memory, for CIFAR-10, CIFAR-100 we achieve error rates of 3.74%, 18.41%. We also considered MNIST, SVHN, Imagenet32, achieving single-bit weight test results of 0.27%, 1.93%, and 42.92/19.95% (Top-1/Top-5) respectively. These error rates are about half those of previously reported error rates for CIFAR-10/100, and are within 1-3% of our error-rates for the same network with full-precision weights. Using a warm-restart learning-rate schedule, we found training for single-bit weights just as fast as full-precision networks, and achieved about 98%-99% of peak performance in just 62 training epochs for CIFAR 10/100.
  • TL;DR: We train wide residual networks that can be immediately deployed using only a single bit for each convolutional weight, with signficantly better accuracy than past methods.
  • Keywords: wide residual networks, model compression, quantization, 1-bit weights

Loading