Recurrent Model-Free RL is a Strong Baseline for Many POMDPs

Tianwei Ni; Benjamin Eysenbach; Sergey Levine; Ruslan Salakhutdinov

Recurrent Model-Free RL is a Strong Baseline for Many POMDPs

Tianwei Ni, Benjamin Eysenbach, Sergey Levine, Ruslan Salakhutdinov

Published: 28 Jan 2022, Last Modified: 22 Jun 2025ICLR 2022 SubmittedReaders: Everyone

Keywords: POMDP, RNN, recurrent model-free RL, baseline, meta RL, robust RL, generalization in RL

Abstract: Many problems in RL, such as meta RL, robust RL, and generalization in RL can be cast as POMDPs. In theory, simply augmenting model-free RL with memory, such as recurrent neural networks, provides a general approach to solving all types of POMDPs. However, prior work has found that such recurrent model-free RL methods tend to perform worse than more specialized algorithms that are designed for specific types of POMDPs. This paper revisits this claim. We find that a careful architecture and hyperparameter decisions yield a recurrent model-free implementation that performs on par with (and occasionally substantially better than) more sophisticated recent techniques in their respective domains. We also release a simple and efficient implementation of recurrent model-free RL for future work to use as a baseline for POMDPs.

One-sentence Summary: Recurrent model-free RL is competitive with more sophisticated methods on partially-observed tasks, provided that some design decisions are made carefully.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 3 code implementations](https://www.catalyzex.com/paper/recurrent-model-free-rl-is-a-strong-baseline/code)

22 Replies

Loading