Keywords: Spiking Neural Networks, Sequence Models, Reinforcement Learning
Abstract: Efficient long-term memory is important for improving the sample efficiency of Partially Observable Reinforcement Learning. In memory-based RL methods, the long-term memory capacity relies on the sequence models used in agent architecture. Two main approaches improve long-term dependency for sequence models, using linear recurrence and using information selection mechanisms such as gating. However, the sample efficiency of existing approaches remains low in long-term memory tasks. In this paper, we first present a saliency-based framework to illustrate why existing methods do not perform well on long-term memory tasks. Specifically, they cannot effectively filter out noisy information irrelevant to the memory task in the early stage of training. To this end, we design a novel linear recurrent module, in which the gating is controlled by spiking neurons. Spiking neurons output discrete values and can more effectively mask noise in the early stages of training, thus improving sample efficiency. The effectiveness of our proposed module is demonstrated on Passive Visual Match, a classic long-term memory task, and several different types of partially observable tasks. The code is attached in the supplementary material and will be made publicly available.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 19161
Loading