Abstract: The performance of reinforcement learning (RL) agents is sensitive to the choice of hyperparameters. In real-world settings like robotics or industrial control systems, however, testing different hyperparameter configurations directly on the environment can be financially prohibitive, dangerous, or time consuming. We focus on hyperparameter tuning from offline logs of data, to fully specify the hyperparameters for an RL agent that learns online in the real world. The approach is conceptually simple: we first learn a model of the environment from the offline data, which we call a calibration model, and then simulate learning in the calibration model to identify promising hyperparameters. Though such a natural idea is (likely) being used in industry, it has yet to be systematically investigated. We identify several criteria to make this strategy effective, and develop an approach that satisfies these criteria. We empirically investigate the method in a variety of settings to identify when it is effective and when it fails.
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: We have now incorporated several of the clarifications requested by the reviewers. We have additionally made the following bigger changes. 1. In terms of novelty, we have now emphasized more upfront that building and using simulators to prototype algorithms for deployment is a natural idea likely being used in industry. We have particularly emphasized the most similar work mentioned by the reviewer, building simulators for gas turbine control. We have further clarified our particular goal and novelty in this work, and how it contrasts to that work. (We thank the reviewers for helping us see this source of confusion, to better place the work!) 2. We have included a discussion (and citations) about using RNNs for the calibration model. We have also modified strong statements about issues with NNs, and rather simply contrasted the KNN approach to that of using NNs, and emphasized that NNs can actually be used to make the KNN approach better (as we do). This is now more clearly separated into its own section, Section 5.4. 3. We have emphasized certain aspects of the problem formulation, including (a) that data is collected previously and given to the agent (does not count towards sample use) and (b) why the model needs to be iterated for many steps. We use our real application as a motivating example throughout, but cannot yet share the results. We are, however, actively working on follow-up, building on the insights we gained from this work.
Assigned Action Editor: ~Marcello_Restelli1