Gradient Descent Robustly Learns the Intrinsic Dimension of Data in Training Convolutional Neural Networks
Keywords: gradient descent, intrinsic dimension of data, rank, convolutional neural network
Abstract: Modern neural networks are usually highly over-parameterized. Behind the wide usage of over-parameterized networks is the belief that, if the data are simple, then the trained network will be automatically equivalent to a simple predictor. Following this intuition, many existing works have studied different notions of "ranks" of neural networks and their relation to the rank of data. In this work, we study the rank of convolutional neural networks (CNNs) trained by gradient descent, with a specific focus on the robustness of the rank to noises in data. Specifically, we point out that, when adding noises to data inputs, the rank of the CNN trained with gradient descent is affected far less compared with the rank of the data, and even when a significant amount of noises have been added, the CNN filters can still effectively recover the intrinsic dimension of the clean data. We back up our claim with a theoretical case study, where we consider data points consisting of "signals" and "noises" and we rigorously prove that CNNs trained by gradient descent can learn the intrinsic dimension of the data signals.
Student Paper: Yes
Submission Number: 64
Loading