The minimum-norm gauge for deep learning

TMLR Paper1048 Authors

10 Apr 2023 (modified: 17 Sept 2024)Withdrawn by AuthorsEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Feedforward neural networks with homogeneous activation functions possess a gauge symmetry: the functions they compute do not change when the incoming and outgoing weights at any hidden unit are rescaled by reciprocal positive values. There are other important properties of these networks, however, that are not invariant under such transformations. For example, networks with highly unbalanced weights may be slower to train or harder to compare and interpret. We describe a simple procedure for gauge-fixing in homogeneous networks; this procedure computes multiplicative rescaling factors—one at each hidden unit—that rebalance the weights of these networks without changing the end-to-end functions that they compute. Specifically, given an initial network with arbitrary weights, the procedure determines the functionally equivalent network whose weights are as small as possible (as measured by their $\ell_p$-norm); this transformed network also has the property that the norms of incoming and outgoing weights at each hidden unit are exactly balanced. The rescaling factors that perform this transformation are found by solving a convex optimization, and we derive simple multiplicative updates that provably converge to its solution. Next we analyze the optimization landscape in deep networks and derive conditions under which this minimum-norm solution is preserved during learning. Finally we explore the effects of gauge-fixing on the speed and outcomes of learning in deep networks by stochastic gradient descent. On multiple problems in classification we find that gauge-fixing leads to faster descent in the regularized log-loss.
Submission Length: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Jeffrey_Pennington1
Submission Number: 1048
Loading