Keywords: minimizing parameter l2 norm, representation cost, implicit bias
Abstract: We provide a function space characterization of the inductive bias resulting from minimizing the $\ell_2$ norm of the weights in multi-channel linear convolutional networks. We define an \textit{induced regularizer} in the function space as the minimum $\ell_2$ norm of weights of a network required to realize a function. For two layer linear convolutional networks with $C$ output channels and kernel size $K$, we show the following: (a) If the inputs to the network have a single channel, the induced regularizer for any $K$ is \textit{independent} of the number of output channels $C$. Furthermore, we derive the regularizer is a norm given by a semidefinite program (SDP). (b) In contrast, for networks with multi-channel inputs, multiple output channels can be necessary to merely realize all matrix-valued linear functions and thus the inductive bias \emph{does} depend on $C$. However, for sufficiently large $C$, the induced regularizer is again given by an SDP that is independent of $C$. In particular, the induced regularizer for $K=1$ and $K=D$ are given in closed form as the nuclear norm and the $\ell_{2,1}$ group-sparse norm, respectively, of the Fourier coefficients.
We investigate the applicability of our theoretical results to a broader scope of ReLU convolutional networks through experiments on MNIST and CIFAR-10 datasets.
One-sentence Summary: We study the function space view of minimizing l2 norm of weights in multi-channel linear convolutional networks, uncovering an invariance to the number of output channels.
Supplementary Material: zip
1 Reply
Loading