Performance Prediction In Reinforcement Learning: The Good, The Bad And The Ugly

Performance Prediction In Reinforcement Learning: The Good, The Bad And The Ugly

TMLR Paper7665 Authors

24 Feb 2026 (modified: 03 Mar 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Reinforcement learning (RL) methods are known to be highly sensitive to their hyperparameter settings and costly to evaluate. In light of this, surrogate models that predict the performance of a given algorithm given a hyperparameter configuration seem an attractive solution for understanding and optimising these computationally expensive tasks. In this work, we are studying such surrogates for RL and find that RL methods present a significant challenge to current performance prediction approaches. Specifically, RL landscapes appear to be rugged and noisy, which leads to the poor quality of surrogate models. Even if surrogate models are only used for gaining insights into the hyperparameter landscapes and not as replacements for algorithm evaluations in benchmarking, we find that they deviate substantially from the ground truth. While our evaluation highlights the limits of surrogate modelling for RL, we propose a method for automatically reducing configuration spaces for improved surrogate performance. We also derive recommendations for RL practitioners that caution against blindly trusting surrogate-based methods for this domain and highlight where and how they can be used.

Submission Type: Long submission (more than 12 pages of main content)

Assigned Action Editor: ~Adam_M_White1

Submission Number: 7665

Loading