Weight-space symmetry in neural network loss landscapes revisited

Berfin Simsek; Johanni Brea; Bernd Illing; Wulfram Gerstner

Weight-space symmetry in neural network loss landscapes revisited

Berfin Simsek, Johanni Brea, Bernd Illing, Wulfram Gerstner

25 Sept 2019 (modified: 05 May 2023)ICLR 2020 Conference Blind SubmissionReaders: Everyone

TL;DR: Weight-space symmetry in neural network landscapes gives rise to numerous number of saddles and flat high-dimensional subspaces.

Abstract: Neural network training depends on the structure of the underlying loss landscape, i.e. local minima, saddle points, flat plateaus, and loss barriers. In relation to the structure of the landscape, we study the permutation symmetry of neurons in each layer of a deep neural network, which gives rise not only to multiple equivalent global minima of the loss function but also to critical points in between partner minima. In a network of $d-1$ hidden layers with $n_k$ neurons in layers $k = 1, \ldots, d$, we construct continuous paths between equivalent global minima that lead through a `permutation point' where the input and output weight vectors of two neurons in the same hidden layer $k$ collide and interchange. We show that such permutation points are critical points which lie inside high-dimensional subspaces of equal loss, contributing to the global flatness of the landscape. We also find that a permutation point for the exchange of neurons $i$ and $j$ transits into a flat high-dimensional plateau that enables all $n_k!$ permutations of neurons in a given layer $k$ at the same loss value. Moreover, we introduce higher-order permutation points by exploiting the hierarchical structure in the loss landscapes of neural networks, and find that the number of $K$-th order permutation points is much larger than the (already huge) number of equivalent global minima -- at least by a polynomial factor of order $K$. In two tasks, we demonstrate numerically with our path finding method that continuous paths between partner minima exist: first, in a toy network with a single hidden layer on a function approximation task and, second, in a multilayer network on the MNIST task. Our geometric approach yields a lower bound on the number of critical points generated by weight-space symmetries and provides a simple intuitive link between previous theoretical results and numerical observations.

Keywords: Weight-space symmetry, neural network landscapes

Original Pdf: pdf

9 Replies

Loading