Training 1-Bit Networks on a Sphere: A Geometric Approach

Luis Guerra, Thalaiyasingam Ajanthan, Gil Avraham, Yan Zou, Tom Drummond

2022 (modified: 17 Nov 2022)ICANN (3) 2022Readers: Everyone

Abstract: Weight binarization offers a promising alternative towards building highly efficient Deep Neural Networks (DNNs) that can be deployed in low-power, constrained devices. However, given their discrete nature, training 1-bit DNNs is not a straightforward or uniquely defined process and several strategies have been proposed to address this issue yielding every time closer performance to their full-precision counterparts. In this paper we analyze 1-bit DNNs from a differential geometry perspective. We part from noticing that for a given model with d binary weights, all possible weight configurations lie on a sphere of radius $$\sqrt{d}$$ . Along with the traditional training procedure based on the Straight Through Estimator (STE), we leverage concepts from the fields of Riemannian optimization to constrain the search space to spherical manifolds, a subset of Riemannian manifolds. Our approach offers a principled solution; nevertheless, in practice we found that simply constraining the norm of the underlying auxiliary network works just as effectively. Additionally, we observe that by enforcing a unit norm on the network parameters, our network explores a space of well-conditioned matrices. Complementary to our approach, we additionally define an angle based regularization that guides the auxiliary space exploration. We binarize a ResNet architecture in order to demonstrate the effectiveness of our approach in the tasks of image classification on the CIFAR-100 and ImageNet datasets.

0 Replies