Keywords: reinforcement learning, neuro-symbolic planner, differentiable symbolic planner, differentiable reasoning, reward shaping
TL;DR: We introduce Dylan, a novel framework that uses human priors (e.g., ''keys open door'') to help RL agents learn with fewer training interactions. It can also serve as a differentiable planner, composing logic options to synthesize novel behaviors
Abstract: When tackling complex problems, humans naturally break them down into smaller, manageable subtasks and adjust their initial plans based on observations. For instance, if you want to make coffee at a friend’s place, you might initially plan to grab coffee beans and go to the coffee machine. Upon noticing that the machine is full, you would skip the initial steps and proceed directly to brewing. In stark contrast, state-of-the-art reinforcement learners, such as Proximal Policy Optimization (PPO), lack such prior knowledge and therefore require significantly more training steps to exhibit comparable adaptive behavior. Thus, a central research question arises: \textit{How can we enable reinforcement learning (RL) agents to have similar ``human'' priors, allowing the agent to learn with fewer training interactions?} To address this challenge, we propose \textbf{d}ifferentiable s\textbf{y}mbolic p\textbf{lan}ner (Dylan), a novel framework that integrates symbolic planning into Reinforcement Learning. Dylan serves as a reward model that dynamically shapes rewards by leveraging human priors, guiding agents through intermediate subtasks, thus enabling more efficient exploration. Beyond reward shaping, Dylan can work as a high-level planner that composes (logic) options to generate new behaviors while avoiding common symbolic planner pitfalls such as infinite execution loops. Our experimental evaluations demonstrate that Dylan significantly improves RL agents' performance and facilitates generalization to unseen tasks.
Supplementary Material: zip
Primary Area: neurosymbolic & hybrid AI systems (physics-informed, logic & formal reasoning, etc.)
Submission Number: 10270
Loading