Keywords: model-based reinforcement learning, deep learning, generative agents, policy gradient, imitation learning
TL;DR: In this paper, we formulate a way to ensure consistency between the predictions of dynamics model and the real observations from the environment. Thus allowing the agent to learn powerful policies, as well as better dynamics models.
Abstract: Model-based reinforcement learning approaches have the promise of being sample efficient. Much of the progress in learning dynamics models in RL has been made by learning models via supervised learning. There is enough evidence that humans build a model of the environment, not only by observing the environment but also by interacting with the environment. Interaction with the environment allows humans to carry out "experiments": taking actions that help uncover true causal relationships which can be used for building better dynamics models. Analogously, we would expect such interaction to be helpful for a learning agent while learning to model the environment dynamics. In this paper, we build upon this intuition, by using an auxiliary cost function to ensure consistency between what the agent observes (by acting in the real world) and what it imagines (by acting in the ``learned'' world). Our empirical analysis shows that the proposed approach helps to train powerful policies as well as better dynamics models.