Keywords: deep reinforcement learning, batch reinforcement learning, buildings, HVAC
TL;DR: A unified method to incorporate rule-based policy into online and offline reinforcement learning algorithm
Abstract: Rule-based control (RBC) is widely adopted in buildings due to its stability and robustness. It resembles a behavior cloning methodology refined by human expertise. However, it is unlikely for RBC to exceed a reinforcement learning (RL) agent’s performance since it is challenging to ingest a large number of parameters during decision-making. In this paper, we explore how to incorporate rule-based control into reinforcement learning to learn a more robust policy in both online and offline settings with a unified approach. We start with state-of-the-art online and offline RL methods, TD3 and TD3+BC, then improve on them using a dynamically weighted actor loss function to selectively choose which policy should RL models learn from at each time step of training. With experiments across multiple tasks and various weather conditions in both deterministic and stochastic scenarios, we empirically demonstrate that our dynamically weighted rule-based incorporated control regularization (RUBICON) method outperforms representative baseline methods in offline settings by 40.7% in a reward settings consisting of the combination of thermal comfort and energy consumption and by 49.7% in online settings in building-RL environments.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Supplementary Material: zip
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Reinforcement Learning (eg, decision and control, planning, hierarchical RL, robotics)