PADDLE: Logic Program Guided Policy Reuse in Deep Reinforcement Learning

Published: 28 Oct 2023, Last Modified: 04 Dec 2023GenPlan'23EveryoneRevisionsBibTeX
Abstract: Learning new skills through previous experience is common in human life, which is the core idea of Transfer Reinforcement Learning (TRL). This requires the agent to learn \emph{when} and \emph{which} source policy is the best to reuse as the target task's policy, and \emph{how} to reuse the source policy. Most TRL methods learn, transfer, and reuse black-box policies, which is hard to explain 1) when to reuse, 2) which source policy is effective, and 3) reduces transfer efficiency. In this paper, we propose a novel TRL method called \textbf{P}rogr\textbf{A}m gui\textbf{D}e\textbf{D} po\textbf{L}icy r\textbf{E}use (PADDLE) that can measure the logic similarities between tasks and transfer knowledge with interpretable cause-effect logic to the target task. To achieve this, we first propose a hybrid decision model that synthesizes high-level logic programs and learns low-level DRL policy to learn multiple source tasks. Second, we estimate the logic similarity between the target task and the source tasks and combine it with the low-level policy similarity to select the appropriate source policy as the guiding policy for the target task. Experimental results show that our method can effectively select the appropriate source tasks to guide learning on the target task, outperforming black-box TRL methods.
Submission Number: 78