Keywords: programmatic policies, reinforcement learning, options
TL;DR: We present a system that leverages human knowledge encoded in foundation models to provide programmatic policies that encode "innate skills" to agents in the form of temporally extended actions, or options.
Abstract: Outside of transfer learning settings, reinforcement learning agents start their learning process from a clean slate. As a result, such agents have to go through a slow process to learn even the most obvious skills required to solve a problem. In this paper, we present InnateCoder, a system that leverages human knowledge encoded in foundation models to provide programmatic policies that encode "innate skills" in the form of temporally extended actions, or options. In contrast to existing approaches to learning options, InnateCoder learns them from the general human knowledge encoded in foundation models in a zero-shot setting, and not from the knowledge the agent gains by interacting with the environment. Then, InnateCoder searches for a programmatic policy by combining the programs encoding these options into a larger and more complex program. We hypothesized that InnateCoder's scheme of learning and using options could improve the sampling efficiency of current methods for synthesizing programmatic policies. We evaluated our hypothesis in MicroRTS and Karel the Robot, two challenging domains. Empirical results support our hypothesis, since they show that InnateCoder is more sample efficient than versions of the system that do not use options or learn the options from experience. The policies InnateCoder learns are competitive and often outperform current state-of-the-art agents in both domains.
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 3146
Loading