Modeling the program of the world in reinforcement learning

Sep 29, 2021ICLR 2022 Conference Desk Rejected SubmissionReaders: Everyone
  • Keywords: Reinforcement Learning, Program Synthesis, Model-Based Reinforcement Learning, Forward Model
  • Abstract: In Reinforcement Learning, one way to lower the number of interactions with the environment is to build an agent who learns and utilizes the dynamics of the environment. Existing methods typically approximate the forward model of the environment with a neural network or a model specifically designed for a concrete task. We argue that \textit{the world}, within which the agent acts, unfolds in different environments or \textit{scenarios}, that differ in their layouts but follow the same high--level fundamental rules that constitute \textit{the World Program}. We propose to reveal and model these rules using \emph{program synthesis} techniques. The novel method called \emph{World Programs Synthesizer} represents the latent dynamics of scenarios as a learnt program in a context--free language. We have applied our method in the VGDL language games domain and demonstrate qualitative improvements over prior approaches to knowledge transfer between levels of a particular game.
  • One-sentence Summary: Modeling the latent dynamics of environments using Program Synthesis
1 Reply