Unifying Model-Based and Model-Free Reinforcement Learning with Equivalent Policy Sets

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: model-based reinforcement learning, model-free reinforcement learning
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We combine model-based and model-free RL by using model-free RL to search over the set of policies that are not provably suboptimal according to a model.
Abstract: Model-based and model-free reinforcement learning (RL) each possess relative strengths that prevent either algorithm from strictly dominating the other. Model-based RL often offers greater data efficiency, as it can use models to evaluate many possible behaviors before choosing one to enact. However, because models cannot perfectly represent complex environments, agents that rely too heavily on models may suffer from poor asymptotic performance. Model-free RL avoids this problem at the expense of data efficiency. In this work, we seek a unified approach to RL that combines the strengths of both algorithms. To this end, we propose *equivalent policy sets* (EPS), a novel tool for quantifying the limitations of models for the purposes of decision making. Based on this concept, we propose *Unified RL*, a novel RL algorithm that uses models to constrain model-free RL to the set of policies that are not provably suboptimal, according to model-based bounds on policy performance. We demonstrate across a range of benchmarks that Unified RL effectively combines the relative strengths of both model-based and model-free RL, in that it achieves comparable data efficiency to model-based RL and exceeds the data efficiency of model-free RL, while achieving asymptotic performance similar or superior to that of model-free RL. Additionally, we show that Unified RL outperforms a number of existing state-of-the-art model-based and model-free RL algorithms, and can learn effective policies in situations where either model-free or model-based RL alone fail.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: pdf
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 8201
Loading