Tackling Underestimation Bias in Successor Features by Distributional Reinforcement Learning

15 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: Successor features, Distributional reinforcement learning, Underestimation bias
Abstract: The framework of successor features (SFs) and generalized policy improvement (GPI) yields the potential to achieve zero-shot transfer in reinforcement learning (RL) among different tasks. However, GPI always suffers from inaccurate value function approximation in practice, resulting in a ``zero-shot'' somewhat fantastical. This paper focuses on comprehending the underlying causes of inaccurate SFs and presents a methodology for improving their accuracy. Our contributions encompass four key aspects: (i) we theoretically study the underestimation phenomenon in SF\&GPI; (ii) we introduce distributional RL into SF\&GPI, and demonstrate its effectiveness in relieving such underestimation; (iii) we show that distributional SFs (DSFs) is provided with a lower generalization bound than original SFs; (iv) we put forward that the performance of SFs-based algorithms can be enhanced by incorporating DSFs. Furthermore, we verify the quality of employing DSFs on the platform of multi-objective RL (MORL). Simulation study demonstrates the superiority of our concept in addressing underestimation challenges.
Supplementary Material: zip
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 79
Loading