Neural Synaptic Balance

13 May 2024 (modified: 06 Nov 2024)Submitted to NeurIPS 2024EveryoneRevisionsBibTeXCC BY-NC-ND 4.0
Keywords: neural networks, deep learning, activation functions, regularizations, scaling, neural balance
TL;DR: The paper presents a theory of neural synaptic balance based on systematic relationships between the input weights and the output weights of neurons.
Abstract: For a given additive cost function $R$ (regularizer), a neuron is said to be in balance if the total cost of its input weights is equal to the total cost of its output weights. The basic example is provided by feedforward layered networks of ReLU units trained with $L_2$ regularizers, which exhibit balance after proper training. We develop a general theory that extends this phenomenon in three broad directions in terms of: (1) activation functions; (2) regularizers, including all $L_p$ ($p>0$) regularizers; and (3) architectures (non-layered, recurrent, convolutional, mixed activations). Gradient descent on the error function alone does not converge in general to a balanced state where every neuron is in balance, even when starting from a balanced state. However, gradient descent on the regularized error function must converge to a balanced state, and thus network balance can be used to assess learning progress. The theory is based on two local neuronal operations: scaling which is commutative, and balancing which is not commutative. Finally, and most importantly, given any initial set of weights, when local balancing operations are applied to each neuron in a stochastic manner, global order always emerges through the convergence of the stochastic algorithm to the same unique set of balanced weights. The reason for this convergence is the existence of an underlying strictly convex optimization problem where the relevant variables are constrained to a linear, only architecture-dependent, manifold. The theory is corroborated through simulations carried out on benchmark data sets. Balancing operations are entirely local and thus physically plausible in biological and neuromorphic networks.
Supplementary Material: zip
Primary Area: Deep learning architectures
Submission Number: 7583
Loading