MDP Playground: Controlling Orthogonal Dimensions of Hardness in Toy Environments

Raghu Rajan; Jessica Lizeth Borja Diaz; Suresh Guttikonda; Fabio Ferreira; André Biedenkapp; Frank Hutter

MDP Playground: Controlling Orthogonal Dimensions of Hardness in Toy Environments

Raghu Rajan, Jessica Lizeth Borja Diaz, Suresh Guttikonda, Fabio Ferreira, André Biedenkapp, Frank Hutter

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: Reinforcement learning, Benchmarks, Efficiency, Reproducibility, Core issues, Algorithm analysis, Dimensions of hardness, OpenAI Gym

Abstract: We present MDP Playground, an efficient benchmark for Reinforcement Learning (RL) algorithms with various dimensions of hardness that can be controlled independently to challenge algorithms in different ways and to obtain varying degrees of hardness in generated environments. We consider and allow control over a wide variety of key hardness dimensions, including delayed rewards, rewardable sequences, sparsity of rewards, stochasticity, image representations, irrelevant features, time unit, and action max. While it is very time consuming to run RL algorithms on standard benchmarks, we define a parameterised collection of fast-to-run toy benchmarks in OpenAI Gym by varying these dimensions. Despite their toy nature and low compute requirements, we show that these benchmarks present substantial challenges to current RL algorithms. Furthermore, since we can generate environments with a desired value for each of the dimensions, in addition to having fine-grained control over the environments' hardness, we also have the ground truth available for evaluating algorithms. Finally, we evaluate the kinds of transfer for these dimensions that may be expected from our benchmarks to more complex benchmarks. We believe that MDP Playground is a valuable testbed for researchers designing new, adaptive and intelligent RL algorithms and those wanting to unit test their algorithms.

One-sentence Summary: Toy benchmarks for Reinforcement Learning (RL) algorithms with controllable dimensions of hardness

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Supplementary Material: zip

Reviewed Version (pdf): https://openreview.net/references/pdf?id=x3eS5U4Oms

36 Replies

Loading