TL;DR: We study the learning dynamics of linear RNNs using a novel framework that accounts for task dynamics.
Abstract: Recurrent neural networks (RNNs) are powerful models used widely in both machine learning and neuroscience to learn tasks with temporal dependencies and to model neural dynamics. However, despite significant advancements in the theory of RNNs, there is still limited understanding of their learning process and the impact of the temporal structure of data. Here, we bridge this gap by analyzing the learning dynamics of linear RNNs (LRNNs) analytically, enabled by a novel framework that accounts for task dynamics. Our mathematical result reveals four key properties of LRNNs: (1) Learning of data singular values is ordered by both scale and temporal precedence, such that singular values that are larger and occur later are learned faster. (2) Task dynamics impact solution stability and extrapolation ability. (3) The loss function contains an effective regularization term that incentivizes small weights and mediates a tradeoff between recurrent and feedforward computation. (4) Recurrence encourages feature learning, as shown through a novel derivation of the neural tangent kernel for finite-width LRNNs. As a final proof-of-concept, we apply our theoretical framework to explain the behavior of LRNNs performing sensory integration tasks. Our work provides a first analytical treatment of the relationship between the temporal dependencies in tasks and learning dynamics in LRNNs, building a foundation for understanding how complex dynamic behavior emerges in cognitive models.
Lay Summary: Neural networks and brains seem to show similar behavior when performing certain tasks involving information that changes over time-- for example, predicting the landing spot of a tennis ball as it moves. How do brains and neural networks learn these tasks, and how do the types of tasks we learn affect them? In this paper, we study learning in neural networks that can solve time-dependent tasks, called recurrent neural networks (RNN).
Although RNNs are generally quite complex, with a few assumptions, we derive a simple set of equations that describes the RNN's learning in terms of how well it can perform the task. By studying these equations, we find that tasks that rely on the recent past are learned faster. We can also predict how a RNN will solve a task, how well a RNN will do on a new version of a task, and how a RNN's "connections" will look like, depending on what it's initially trained on.
Although the RNNs we study are much simpler than the brain, understanding how they learn and work can help us understand more about learning and cognition in general, some of which might apply to more complex systems. This research acts as a first stepping stone towards describing more complex settings, networks, and behavior, related to learning time-dependent tasks.
Link To Code: https://github.com/aproca/LRNN_dynamics
Primary Area: Theory->Deep Learning
Keywords: deep learning, learning dynamics, RNNs, rich and lazy learning, teacher-student, neuroscience
Submission Number: 11083
Loading