Abstract: Memory-based Deep Reinforcement Learning (DRL) has been successfully applied to solve vision-based control tasks from high-dimensional sensory data. While most of this work leverages the Long Short-Term Memory (LSTM) as the memory module of the agent, recent developments have revisited and extended the original formulation of the LSTM. Some of these developments include the ConvLSTM, a convolutional-based implementation of the LSTM, the MDN-RNN, the combination of a Mixture Density Network with an LSTM and the GridLSTM, a multidimensional grid of LSTM cells. It seems however unclear how these different memory modules compare to each other in terms of agent performance, when applied in the context of DRL. This work aims to perform a comparative study of several memory-based DRL agents, based on the LSTM, ConvLSTM, MDN-RNN and GridLSTM memory modules. The results obtained seem to support the claim that in some cases these more recent memory modules can improve the performance of the agent, to varying degrees, when compared to a baseline agent based on an LSTM. The experimental results were validated in the Atari 2600 videogame platform.
Loading