Reviewed Version (pdf): https://openreview.net/references/pdf?id=d6r1vWaoBd
Keywords: CNN, training, quantization, low-bit, energy efficiency
Abstract: In this paper, we propose a low-bit training framework for convolutional neural networks. Our framework focuses on reducing the energy and time consumption of convolution kernels, by quantizing all the convolutional operands (activation, weight, and error) to low bit-width. Specifically, we propose a multi-level scaling (MLS) tensor format, in which the element-wise bit-width can be largely reduced to simplify floating-point computations to nearly fixed-point. Then, we describe the dynamic quantization and the low-bit tensor convolution arithmetic to efficiently leverage the MLS tensor format. Experiments show that our framework achieves a superior trade-off between the accuracy and the bit-width than previous methods. When training ResNet-20 on CIFAR-10, all convolution operands can be quantized to 1-bit mantissa and 2-bit exponent, while retaining the same accuracy as the full-precision training. When training ResNet-18 on ImageNet, with 4-bit mantissa and 2-bit exponent, our framework can achieve an accuracy loss of less than $1\%$. Energy consumption analysis shows that our design can achieve over $6.8\times$ higher energy efficiency than training with floating-point arithmetic.
One-sentence Summary: We propose a low-bit training framework with multi-level scaling tensor format, so that the data bit-width for all the convolution inputs in training can be reduced, and the energy efficiency can be improved.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics