Planning with a Learned Policy Basis to Optimally Solve Complex Tasks

Published: 12 Feb 2024, Last Modified: 06 Mar 2024ICAPS 2024EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Reinforcement Learning, Reward Machines, Composable RL
Abstract: Conventional reinforcement learning (RL) methods can successfully solve a wide range of sequential decision problems. However, learning policies that can generalize predictably across multiple tasks in a setting with non-Markovian reward specifications is a challenging problem. We propose to use successor features to learn a set of local policies that each solves a well-defined subproblem. In a task described by a finite state automaton (FSA) that involves the same set of subproblems, the combination of these local policies can then be used to generate an optimal solution without additional learning. In contrast to other methods that combine local policies via planning, our method asymptotically attains global optimality, even in stochastic environments.
Primary Keywords: Learning
Category: Long
Student: Graduate
Submission Number: 200