Abstract: Optimizing binarized neural networks (BNNs) offers efficiency advantages but presents unique optimization challenges due to the limited representational capacity of binary values. Many contemporary techniques optimize weights by introducing latent real-valued counterparts within conventional optimization frameworks, approximating the derivative of the potentially stochastic function round<math><mi mathvariant="bold" is="true">round</mi></math>: Rn→{±1}n<math><mrow is="true"><msup is="true"><mrow is="true"><mi mathvariant="double-struck" is="true">R</mi></mrow><mrow is="true"><mi is="true">n</mi></mrow></msup><mo is="true">→</mo><msup is="true"><mrow is="true"><mrow is="true"><mo is="true">{</mo><mo is="true">±</mo><mn is="true">1</mn><mo is="true">}</mo></mrow></mrow><mrow is="true"><mi is="true">n</mi></mrow></msup></mrow></math>. However, some recent works focus on training BNNs by implementing weight flipping based on the sign and magnitude of accumulated gradients. In this paper, we thoroughly investigate the principles of weight sign reversals and find that the training becomes unstable when the accumulated gradients approach marginal values that trigger a flipping. To address this issue, we introduce an innovative probabilistic optimizer to determine weight signs. This optimizer leverages the Bernoulli distribution of accumulated gradients to make informed decisions about weight signs, thereby enhancing training stability. Furthermore, we implement measures to minimize the likelihood of sign reversals in weights that frequently undergo flips to reduce the training instability. We conduct an extensive series of experiments on the datasets CIFAR-10, CIFAR-100, and TinyImageNet, and our approach demonstrates significant improvements in accuracy when compared to state-of-the-art optimizers.
External IDs:dblp:journals/ijon/HeGZYWYZ25
Loading