HPO-RL-Bench: A Zero-Cost Benchmark for HPO in Reinforcement Learning

Published: 30 Apr 2024, Last Modified: 19 Jul 2024AutoML 2024 (ABCD Track)EveryoneRevisionsBibTeXCC BY 4.0
Keywords: benchmark, hyperparameter optimization, reinforcement learning
TL;DR: Tabular benchmark for hyperparameter optimization of model-free RL algorithms on Atari, MuJoCo, and Classic Control environments..
Abstract: Despite the undeniable importance of optimizing the hyperparameters of RL algorithms, existing state-of-the-art Hyperparameter Optimization (HPO) techniques are not frequently utilized by RL researchers. To catalyze HPO research in RL, we present a new large-scale benchmark that includes pre-computed reward curve evaluations of hyperparameter configurations for six established RL algorithms (PPO, DDPG, A2C, SAC, TD3, DQN) on 22 environments (Atari, Mujoco, Control), repeated for multiple seeds. We exhaustively computed the reward curves of all possible combinations of hyperparameters for the considered hyperparameter spaces for each RL algorithm in each environment. As a result, our benchmark permits zero-cost experiments for deploying and comparing new HPO methods. In addition, the benchmark offers a set of integrated HPO methods, enabling plug-and-play tuning of the hyperparameters of new RL algorithms, while pre-computed evaluations allow a zero-cost comparison of a new RL algorithm against the tuned RL baselines in our benchmark.
Submission Checklist: Yes
Broader Impact Statement: Yes
Paper Availability And License: Yes
Code Of Conduct: Yes
Optional Meta-Data For Green-AutoML: This blue field is just for structuring purposes and cannot be filled.
GPU Hours: 274320
Evaluation Metrics: Yes
Submission Number: 13