Keywords: online reinforcement learning, bayesian optimization
Abstract: Most work in online reinforcement learning (RL) tunes hyperparameters in an offline phase without accounting for the interaction. This empirical methodology is reasonable to assess how well algorithms {\em can perform}, but is limited when evaluating algorithms for practical deployment in the real world. In many applications, the environment is not compatible with exhaustive hyperparameter searches, and typical evaluations do not characterize how much data is required for such searches. In this work, we explore \emph{online tuning}, where the agent must select hyperparameters during online interaction. Hyperparameter tuning is part of the agent rather than done in a separate (hidden) tuning phase. We layer sequential Bayesian optimization on standard RL algorithms and assess behavior when tuning hyperparameters online. We show the expected result - this strategy's success depends on the environment and algorithm. In an attempt to address this issue, we try a "naive" smart way of tuning online, which mitigates wasteful resetting and shows that it can achieve comparable results, highlighting the benefits of smarter online tuning approaches.
Submission Number: 21
Loading