Hyperparameters in Reinforcement Learning and How To Tune ThemDownload PDF

Published: 20 Jul 2023, Last Modified: 31 Aug 2023EWRL16Readers: Everyone
Keywords: Reinforcement Learning, AutoRL, Hyperparameter Optimization, Reproducibility
TL;DR: HPO for RL is possible even on low compute budgets - but better reporting of HPO standards is important for reproducibility
Abstract: Deep Reinforcement Learning (RL) has been adopting better scientific practices in order to improve reproducibility such as standardized evaluation metrics and reporting as well as greater attention to implementation details and design decisions. However, the process of hyperparameter optimization still varies widely across papers with inefficient grid searches being most commonly used. This makes fair comparisons between RL algorithms challenging. In this paper, we show that hyperparameter choices in RL can significantly affect the agent’s final performance and sample efficiency, and that the hyperparameter landscape can strongly depend on the tuning seed which might lead to overfitting to single seeds. We therefore propose adopting established best practices from AutoML, such as the separation of tuning and testing seeds, as well as principled hyperparameter optimization (HPO) across a broad search space. We support this by comparing multiple state-of-the-art HPO tools on a range of RL algorithms and environments to their hand-tuned counterparts, demonstrating that HPO approaches often have higher performance and lower compute overhead. As a result of our findings, we recommend a set of best practices for the RL community going forward, which should result in stronger empirical results with fewer computational costs, better reproducibility, and thus faster progress in RL. In order to encourage the adoption of these practices, we provide plug-and-play implementations of the tuning algorithms used in this paper at https://anonymous.4open.science/r/how-to-autorl-DE67/README.md.
Already Accepted Paper At Another Venue: already accepted somewhere else
1 Reply

Loading