HPO-RL-Bench: A Zero-Cost Benchmark for HPO in Reinforcement Learning

Gresa Shala; Sebastian Pineda Arango; André Biedenkapp; Frank Hutter; Josif Grabocka

HPO-RL-Bench: A Zero-Cost Benchmark for HPO in Reinforcement Learning

Gresa Shala, Sebastian Pineda Arango, André Biedenkapp, Frank Hutter, Josif Grabocka

Published: 30 Apr 2024, Last Modified: 07 Aug 2024AutoML 2024 (ABCD Track)EveryoneRevisionsBibTeXCC BY 4.0

Keywords: benchmark, hyperparameter optimization, reinforcement learning

TL;DR: Tabular benchmark for hyperparameter optimization of model-free RL algorithms on Atari, MuJoCo, and Classic Control environments..

Abstract: Despite the undeniable importance of optimizing the hyperparameters of RL algorithms, existing state-of-the-art Hyperparameter Optimization (HPO) techniques are not frequently utilized by RL researchers. To catalyze HPO research in RL, we present a new large-scale benchmark that includes pre-computed reward curve evaluations of hyperparameter configurations for six established RL algorithms (PPO, DDPG, A2C, SAC, TD3, DQN) on 22 environments (Atari, Mujoco, Control), repeated for multiple seeds. We exhaustively computed the reward curves of all possible combinations of hyperparameters for the considered hyperparameter spaces for each RL algorithm in each environment. As a result, our benchmark permits zero-cost experiments for deploying and comparing new HPO methods. In addition, the benchmark offers a set of integrated HPO methods, enabling plug-and-play tuning of the hyperparameters of new RL algorithms, while pre-computed evaluations allow a zero-cost comparison of a new RL algorithm against the tuned RL baselines in our benchmark.

Submission Checklist: Yes

Broader Impact Statement: Yes

Paper Availability And License: Yes

Code Of Conduct: Yes

Optional Meta-Data For Green-AutoML: This blue field is just for structuring purposes and cannot be filled.

GPU Hours: 274320

Evaluation Metrics: Yes

Submission Number: 13

Loading