- Keywords: POMDP, RNN, recurrent model-free RL, baseline, meta RL, robust RL, generalization in RL
- Abstract: Many problems in RL, such as meta RL, robust RL, and generalization in RL can be cast as POMDPs. In theory, simply augmenting model-free RL with memory, such as recurrent neural networks, provides a general approach to solving all types of POMDPs. However, prior work has found that such recurrent model-free RL methods tend to perform worse than more specialized algorithms that are designed for specific types of POMDPs. This paper revisits this claim. We find that a careful architecture and hyperparameter decisions yield a recurrent model-free implementation that performs on par with (and occasionally substantially better than) more sophisticated recent techniques in their respective domains. We also release a simple and efficient implementation of recurrent model-free RL for future work to use as a baseline for POMDPs.
- One-sentence Summary: Recurrent model-free RL is competitive with more sophisticated methods on partially-observed tasks, provided that some design decisions are made carefully.