Keywords: real-time recurrent learning, online recurrent learning, recurrent neural networks, reinforcement learning, actor-critic, policy gradients
TL;DR: We explore the practical promise of RTRL in the settings where no approximation is needed, by evaluating it in many standard RL tasks
Abstract: Real-time recurrent learning (RTRL) for sequence-processing recurrent neural networks (RNNs) offers certain conceptual advantages over backpropagation through time (BPTT). RTRL requires neither caching past activations nor truncating context, and enables online learning. However, RTRL's time and space complexity makes it impractical. To overcome this problem, most recent work on RTRL focuses on approximation theories, while experiments are often limited to diagnostic settings. Here we explore the practical promise of RTRL in more realistic settings. We study actor-critic methods that combine RTRL and policy gradients, and test them in several subsets of DMLab-30, ProcGen, and Atari-2600 environments. On DMLab memory tasks, our system is competitive with or outperforms well-known IMPALA and R2D2 baselines trained on 10B frames, while using fewer than 1.2B environmental frames. To scale to such challenging tasks, we focus on certain well-known neural architectures with element-wise recurrence, allowing for tractable RTRL without approximation. We also discuss rarely addressed limitations of RTRL in real-world applications, such as its complexity in the multi-layer case.
Supplementary Material: zip
Submission Number: 12174
Loading