Abstract: Successor Features (SFs), together with Generalized Policy Improvement (GPI), comprise a conventional transfer Reinforcement Learning (RL) algorithm, which can transfer knowledge using the characteristic of decoupling policy with the task. However, SFs are value-based and cannot handle environments with continuous action spaces since GPI cannot transfer knowledge by traversing all possible actions in such a case. Recently, PeSFA decouples SFs from policies and further endows SFs with generalization capabilities in the policy space. However, it cannot be applied to continuous action spaces. In this paper, we introduce the Continuous PeSFA (CPeSFA) algorithm, an Actor-Critic (AC) architecture designed for learning and transferring policies in continuous action spaces. Our theoretical analysis shows that CPeSFA leverages SFs' generalization in the policy space to accelerate learning rates. Experimental results across Grid World, Reacher, and Point Maze environments demonstrate CPeSFA's superiority and effective knowledge transfer for rapid policy learning in new tasks.
Format: Long format (up to 8 pages + refs, appendix)
Publication Status: No
Submission Number: 1
Loading