Exact learning dynamics of deep linear networks with prior knowledge

Clémentine Carla Juliette Dominé, Lukas Braun, James E Fitzgerald, Andrew M Saxe

29 Sept 2022OpenReview Archive Direct UploadReaders: Everyone

Abstract: Learning in deep neural networks is known to depend critically on the knowledge embedded in the initial network weights. However, few theoretical results have precisely linked prior knowledge to learning dynamics. Here we derive exact solutions to the dynamics of learning with rich prior knowledge in deep linear networks by generalising Fukumizu's matrix Riccati solution \citep{Fukumizu1998}. While simple, deep linear networks retain a non-convex loss landscape and nonlinear learning dynamics that depend in detail on the initial weights of the network. We obtain explicit expressions for the evolving network function, hidden representational similarity, and neural tangent kernel over training for a broad class of initialisations and tasks. We characterise a class of task-independent initialisations that radically alters learning dynamics from slow step-like to fast exponential trajectories while converging to identical representational similarity, dissociating learning trajectories from the structure of internal representations. We discuss the implications of this finding for neural network weight initialisation schemes, continual learning and learning of structured knowledge. Finally, we characterise how network weights dynamically align with task structure, rigorously justifying why previous solutions successfully described learning from small weights without incorporating their fine-scale structure. Taken together, our results provide a mathematical toolkit for understanding the impact of prior knowledge on deep learning.

0 Replies