- TL;DR: Data augmentation provides an implicit regularization of the rugosity or "roughness" of the learned function of a deep network.
- Abstract: Deep (neural) networks have been applied productively in a wide range of supervised and unsupervised learning tasks. Unlike classical machine learning algorithms, deep networks typically operate in the overparameterized regime, where the number of parameters is larger than the number of training data points. Consequently, understanding the generalization properties and the role of (explicit or implicit) regularization in these networks is of great importance. In this work, we explore how the oft-used heuristic of data augmentation imposes an implicit regularization penalty of a novel measure of the rugosity or “roughness” based on the tangent Hessian of the function fit to the training data.
- Keywords: deep networks, implicit regularization, Hessian, rugosity, curviness, complexity