Abstract: It is well established that Reinforcement Learning (RL)
is very brittle and sensitive to the choice of hyperparameters. This prevents RL methods from being usable out of the box.
The field of automated RL (AutoRL) aims at automatically configuring the RL pipeline, to both make RL usable by a broader audience, as well as reveal its full potential.
Still, there has been little progress towards this goal as new AutoRL methods often are evaluated with incompatible experimental protocols.
Furthermore, the typically high cost of experimentation prevents a thorough and meaningful comparison of different AutoRL methods or established hyperparameter optimization (HPO) methods from the automated Machine Learning (AutoML) community.
To alleviate these issues, we propose the first tabular AutoRL Benchmark for studying the hyperparameters of RL algorithms. We consider the hyperparameter search spaces of five well established RL methods (PPO, DDPG, A2C, SAC, TD3) across 22 environments for which we compute and provide the reward curves. This enables HPO methods to simply query our benchmark as a lookup table, instead of actually training agents. Thus, our benchmark offers a testbed for very fast, fair, and reproducible experimental protocols for comparing future black-box, gray-box, and online HPO methods for RL.
0 Replies
Loading