- Abstract: We consider a wider sense of reproducibility in practice for deep reinforcement learning on control tasks, where it holds promise as a general purpose solution. However, unlike other branches of machine learning, for practical reasons only simulation data is used in RL benchmarks. This puts greater responsibility on experiment design for advances on benchmarks to translate to real-world performance. There has been a string of successes with solving ever more impressive-looking problems in the Mujoco physics simulator. However, while involving control with different agent dynamics, there is reason to believe that these benchmarks are in other respects fairly homogeneous, e.g. having deterministic state transitions with quadratic objectives. For instance,  recently showed that many of them are solvable by a simple linear policy and suggest widening the initial state distribution to avoid fragile "trajectory-centric" policies. We take this further and argue that the environments themselves should have both an inherent uncertainty, and more complex objectives. Autonomous robots and vehicles have to deal with a complex uncertain world that changes over time. To exemplify this we introduce a seemingly simple benchmark where randomly moving obstacles have to be avoided. This both precludes a trajectory-centric solution and involves a more difficult objective than commonly used. We show that this toy example actually has higher sample complexity than the Mujoco benchmarks. We further find that the level of robustness expected of real-world autonomous agents leads to requirements on tail-end convergence that is rarely given much consideration in benchmarks, resulting in difficult tuning problems. While some environment variation has recently been shown for e.g. complex humanoid agents in Mujoco, with training times increasingly requiring massive cloud infrastructure, we finally suggest that such toy examples designed to test different problem dimensions have an important role to play in research.
- TL;DR: We identify some neglected problem dimensions to make deep RL control benchmarks more realistic. We evaluate a seemingly simple toy example which we find to actually result in higher sample complexity than standard Mujoco benchmarks.
- Keywords: Deep Reinforcement Learning, Benchmarks, Robustness, Control