Fast Binarized Neural Network Training with Partial Pre-training

Alex Renda; Joshua Wolff Fromm

Fast Binarized Neural Network Training with Partial Pre-training

Alex Renda, Joshua Wolff Fromm

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: binarized neural network, binary, quantized, 1-bit, low precision

Abstract: Binarized neural networks, networks with weights and activations constrained to lie in a 2-element set, allow for more time- and resource-efficient inference than standard floating-point networks. However, binarized neural networks typically take more training to plateau in accuracy than their floating-point counterparts, in terms of both iteration count and wall clock time. We demonstrate a technique, partial pre-training, that allows for faster from-scratch training of binarized neural networks by first training the network as a standard floating-point network for a short amount of time, then converting the network to a binarized neural network and continuing to train from there. Without tuning any hyperparameters across four networks on three different datasets, partial pre-training is able to train binarized neural networks between $1.26\times$ and $1.61\times$ faster than when training a binarized network from scratch using standard low-precision training.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

One-sentence Summary: We demonstrate a technique, partial pre-training, that allows for faster from-scratch training of binarized neural networks.

Reviewed Version (pdf): https://openreview.net/references/pdf?id=4xlKhV7bbA

22 Replies

Loading