Keywords: Hyper-parameter tuning, Reinforcement Learning, Benchmarking
TL;DR: We show that tuning $m$ hyperparameters in RL incurs an exponential sample complexity cost, and propose metrics that account for this overhead to achieve fair algorithm evaluation.
Abstract: The performance of reinforcement learning (RL) algorithms is often benchmarked without accounting for the cost of hyperparameter tuning, despite its significant practical impact. In this position paper, we argue that such practices distort the perceived efficiency of RL methods and impede meaningful algorithmic progress. We formalize this concern by proving a lower bound showing that tuning
$m$ hyperparameters in RL necessarily induces an exponential
$\exp(m)$ blow-up in the sample complexity or regret, in stark contrast to the linear
$O(m)$ overhead observed in supervised learning. This highlights a fundamental inefficiency unique to RL. To address this, we propose evaluation protocols that account for the number and cost of tuned hyperparameters, enabling fairer comparisons across algorithms. Surprisingly, we find that once tuning cost is included, elementary algorithms can outperform their successors with more sophisticated design. These findings call for a shift in how RL algorithms are benchmarked and compared, especially in settings where efficiency and scalability are critical.
Submission Number: 84
Loading