Provable Knowledge Transfer using Successor Feature for Deep Reinforcement Learning

21 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Transfer reinforcement learning, Successor feature
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: This paper explores knowledge transfer using successor features (SFs) in reinforcement learning (RL) scenarios where the reward function changes across tasks while the environment's dynamics remain the same. Under this framework, the Q-function of a task can be decomposed into a successor feature and a reward mapping: the former characterizes the transition dynamics, and the latter characterizes the task-specific reward function. This Q-value function decomposition, coupled with a policy improvement operator known as "generalized policy improvement" (GPI), simplifies the search space for finding the optimal Q-function when transferring knowledge from one task to another that shares the same transition dynamics. As the optimal policy can be directly derived from the optimal Q-function, the SF \& GPI framework exhibits promise in enhancing efficiency and effectiveness in decision-making compared to traditional RL methods like Q-learning. However, despite the observed superior performance of SF \& GPI in numerical experiments, their theoretical foundations remain largely unestablished, especially when learning successor features using deep neural networks in conjunction with deep Q-network (SF-DQN). To the best of our knowledge, this paper provides the first convergence analysis with provable generalization guarantees for SF-DQN with GPI. Moreover, our theoretical results reveal that SF-DQN \& GPI significantly accelerate the policy transfer across tasks and indicate that SF decomposition outperforms non-representation learning approaches, such as deep Q-network (DQN), with simultaneously faster convergence rate and improved generalization. Numerical experiments on real RL tasks support the superior performance of SF-DQN \& GPI, quantitatively aligning with our theoretical findings.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: zip
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4110
Loading