Keywords: Neural networks dyanmics, Feature Learning, Optimization
Abstract: Deep neural networks exhibit rich training dynamics under gradient descent updates. The root of this phenomenon is the non-convex optimization of deep neural networks, which is extensively studied in recent theory works. However, previous works did not consider or only considered a few gradient descent steps under non-asymptotic manner, resulting in an incomplete characterization of the network’s stage-wise learning behavior and the evolutionary trajectory of its parameters and outputs. In this work, we characterize how a network’s feature learning happens during training in a regression setting. We analyze the dynamics of two quantities of a two-layer linear network: the projection of the first layer's weights onto the feature vector, and the weights in the second layer. The former indicates how well the network fits the feature vector from the input data, and the latter stands for the magnitude learned by the network. More importantly, by formulating the dynamics of these two quantities into a non-linear system, we give the precise characterization of the training trajectory, demonstrating the rich feature learning dynamics in the linear neural network. Moreover, we establish a connection between the feature learning dynamics and the neural tangent kernel, illustrating the presence of feature learning beyond lazy training. Experimental simulations corroborate our theoretical findings, confirming the validity of our proposed conclusion.
Submission Number: 86
Loading