Keywords: LLM, Reinforcement Learning, Hierarchical Reinforcement Learning, Hierarchical Agents, Language Conditioned RL
TL;DR: Using LLMs to guide exploration in hierarchical agents by extracting common sense priors
Abstract: Solving long horizon temporally extended tasks using Reinforcement Learning (RL) is extremely challenging, compounded by the common practice of learning without prior knowledge (or tabula rasa learning). Humans can generate and execute plans with temporally extended actions and learn to perform new tasks because we almost never solve problems from scratch. We want autonomous agents to have the same capabilities. Recently, LLMs have shown to encode tremendous amount of knowledge about the world and impressive in-context learning and reasoning capabilities. However, using LLMs to solve real world tasks is challenging as these models are not grounded in the current task. We want to leverage the planning capabilities of LLMs while using RL to provide the essential environment interaction. In this paper, we present a hierarchical agent which uses LLMs to solve long horizon tasks. Instead of completely relying on LLMs, we use them to guide the high-level policy making them significantly more sample efficient. We evaluate our method on simulation environments such as MiniGrid, SkillHack, Crafter and on a real robot arm in block manipulation tasks. We show that agents trained using our method outperform other baselines methods and once trained, they don't depend on LLMs during deployment.
Submission Number: 90
Loading