Using Approximate Models for Efficient Exploration in Reinforcement Learning

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Model-based reinforcement learning, graph neural networks, intuitive physics, exploration
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We leverage structure to learn approximate dynamics models that generalise to unseen tasks, and can be used to guide exploration to significantly improve the sample efficiency of policy learning within a model-based reinforcement learning framework.
Abstract: In model-based reinforcement learning, an agent uses a learned model of environment dynamics to improve a policy. Using a learned model of the environment to select actions has many benefits. It can be used to generate experience for learning a policy or simulate potential outcomes in planning. It allows flexible adaptation to new tasks and goals without having to relearn the underlying fundamentals of the environment from scratch. These sample efficiency and generalisation gains from model use are restricted by the model’s accuracy. An imperfect model can lead to failure if trusted by the agent in regions of the state space where predictions are inaccurate. It is well-documented in cognitive and developmental psychology that humans use approximate intuitive models of physics when navigating the world in everyday scenarios. These intuitive models, despite being imperfect, enable humans to reason flexibly about abstract physical concepts (for example, gravity, collisions and friction), and to apply these concepts to solve novel problems without having to relearn them from scratch. In other words, humans efficiently make use of imperfect models. In this paper, we learn dynamics models for intuitive physics tasks using graph neural networks that explicitly incorporate the abstract structure of objects, relations and events in their design. We demonstrate that these learned models can flexibly generalise to unseen tasks and, despite being imperfect, can improve the sample efficiency of policy learning through guiding exploration to useful regions of the state and action space.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 8036
Loading