Memory Based Reinforcement Learning with Transformersfor Long Horizon Timescales and Continuous Action Spaces

Shweta Singh, Sudaman Rajesh Katti

17 May 2023OpenReview Archive Direct UploadReaders: Everyone

Abstract: The most well known sequence models make use of complex recurrent neural networks in an encoder-decoder configuration. The model used in this research makes use of a transformer,which is based purely on self-attention mechanism, without relying on recurrence at all. More specifically, encoders and decoders which make use self attention and operate based on a memory are used. In this research work, results for various 3D visual and non-visual reinforcement learning tasks designed in Unity software were obtained. Convolutional neural networks, more specifically, nature CNN architecture is used for input processing in visual tasks and comparison with standard long short-term memory (LSTM) architecture is performed for both visual tasks based on CNNs and non-visual tasks based on coordinate inputs. This research work combines the transformer architecture with the proximal policy optimization technique used popularly in reinforcement learning for stability and better policy updates while training, especially for continuous action spaces, which are used in this research work. Certain tasks in this paper are long horizon tasks which carry on for a longer duration and require extensive use of memory based functionalities like storage of experiences and choosing of appropriate actions based on recall. The transformer, which makes use of memory and selfattention mechanism in an encoder-decoder configuration proved to have better performance when compared to LSTM in terms of exploration and rewards achieved. Such memory based architectures can be used extensively in the field of cognitive robotics and reinforcement learning.

0 Replies