- TL;DR: signal propagation theory applied to continuous surrogates of binary nets; counter intuitive initialisation; reparameterisation trick not helpful
- Abstract: The training of stochastic neural network models with binary ($\pm1$) weights and activations via continuous surrogate networks is investigated. We derive, using mean field theory, a set of scalar equations describing how input signals propagate through surrogate networks. The equations reveal that depending on the choice of surrogate model, the networks may or may not exhibit an order to chaos transition, and the presence of depth scales that limit the maximum trainable depth. Specifically, in solving the equations for edge of chaos conditions, we show that surrogates derived using the Gaussian local reparameterisation trick have no critical initialisation, whereas a deterministic surrogates based on analytic Gaussian integration do. The theory is applied to a range of binary neuron and weight design choices, such as different neuron noise models, allowing the categorisation of algorithms in terms of their behaviour at initialisation. Moreover, we predict theoretically and confirm numerically, that common weight initialization schemes used in standard continuous networks, when applied to the mean values of the stochastic binary weights, yield poor training performance. This study shows that, contrary to common intuition, the means of the stochastic binary weights should be initialised close to close to $\pm 1$ for deeper networks to be trainable.
- Original Pdf: pdf