Breaking Random-Init Symmetry: Theory-Informed Initialization for ReLU Networks
Keywords: linear mode connectivity, weight-space symmetry, permutation symmetry, initialization, constructive approximation, symmetry breaking, ReLU networks, Git Re-Basin
TL;DR: We build first-layer weights from domain theory of the target, breaking permutation symmetry at initialization: two networks from different seeds train into solutions Git Re-Basin matches with the identity permutation.
Abstract: Standard random initialization is permutation-symmetric: every assignment of neuron indices to functional roles is equally likely, so a trained network is one of many symmetry-equivalent representatives. We propose Theory-Init, an initialization that selects a single canonical representative by construction. Placing deterministic rows in fixed first-layer slots, propagating them through middle layers of the form $I+\Delta$ with small Gaussian $\Delta$, and reading out with a small-random output layer breaks the permutation symmetry directly at initialization, independent of which deterministic rows are used. When those rows additionally encode domain theory of the target, the same construction produces an accuracy advantage on physics-rich tasks. We give a theoretical account of why indices are approximately preserved through ReLU layers, with a moment identity, a worst-case $L^2$ bound, and a high-probability near-isometry of the middle-block Jacobian. Empirically, across four regression tasks and a small CNN on MNIST and CIFAR-10, two Theory-Init networks trained from different seeds converge close to one another in weight space while He-initialized pairs land an order of magnitude further apart, and Git Re-Basin returns the identity permutation between two Theory-Init networks but a non-trivial one between two He networks. Theory-Init thus offers a route to weight-space symmetry removal that is built in rather than searched for.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 38
Loading