Abstract: While modern (deep) Neural Networks (NN) with their high number of parameters have the
ability to memorize training data, they achieve surprisingly high accuracies on test sets. One
theory that could explain this behavior is based on the manifold hypothesis: real-world high-
dimensional input data lies near low-dimensional manifolds. A NN layer transforms the input
manifold, arriving at a so-called representation manifold. The NN learns transformations
which flatten and disentangle the manifolds layer by layer. In this way, the NNs learn the
structure of the data instead of memorizing. Under the manifold hypothesis, we demonstrate
that flat manifolds (affine linear subspaces) in the second-to-last layer of a classification
network ensure perfect classification performance in the noiseless case. In regression tasks,
we derive an upper bound on the generalization error which decreases as the input manifold
becomes flatter. In the case of almost flat manifolds, the bound can be modified to be even
lower. These results support the argument that flat input manifolds improve generalization.
However, we argue that the results can also be used to show that flatter representation
manifolds improve generalization. Further, we conduct numerical experiments to show that
these findings apply beyond strict theoretical assumptions. Based on our results, we argue
that a flatness promoting regularizer, combined with an $L1$-regularizer, could enhance the
generalization of Neural Networks.
Submission Type: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Benjamin_Guedj1
Submission Number: 6640
Loading