Abstract: We study the interplay between memorization and generalization of
overparametrized networks in the extreme case of a single training example.
The learning task is to predict an output which is as similar as possible to
the input. We examine both fully-connected and convolutional networks that are
initialized randomly and then trained to minimize the reconstruction
error. The trained networks take one of the two forms: the constant
function (``memorization'') and the identity function (``generalization''). We
show that different architectures exhibit vastly different inductive bias
towards memorization and generalization.
1 Reply
Loading