Memory Based Reinforcement Learning with Transformersfor Long Horizon Timescales and Continuous Action Spaces
Abstract: The most well known sequence models make use of
complex recurrent neural networks in an encoder-decoder
configuration. The model used in this research makes use of a
transformer,which is based purely on self-attention mechanism,
without relying on recurrence at all. More specifically, encoders
and decoders which make use self attention and operate based on
a memory are used. In this research work, results for various 3D
visual and non-visual reinforcement learning tasks designed in
Unity software were obtained. Convolutional neural networks,
more specifically, nature CNN architecture is used for input
processing in visual tasks and comparison with standard long
short-term memory (LSTM) architecture is performed for both
visual tasks based on CNNs and non-visual tasks based on
coordinate inputs. This research work combines the transformer
architecture with the proximal policy optimization technique
used popularly in reinforcement learning for stability and better
policy updates while training, especially for continuous action
spaces, which are used in this research work. Certain tasks in this
paper are long horizon tasks which carry on for a longer duration
and require extensive use of memory based functionalities like
storage of experiences and choosing of appropriate actions based
on recall. The transformer, which makes use of memory and selfattention mechanism in an encoder-decoder configuration proved
to have better performance when compared to LSTM in terms of
exploration and rewards achieved. Such memory based
architectures can be used extensively in the field of cognitive
robotics and reinforcement learning.
0 Replies
Loading