Tackling Underestimation Bias in Successor Features by Distributional Reinforcement Learning

Mengxiao Lu; Yirui Zhou; Huojun Hong; Yaxin Peng; Xiaofeng Zhang; Yangchun Zhang

Tackling Underestimation Bias in Successor Features by Distributional Reinforcement Learning

Mengxiao Lu, Yirui Zhou, Huojun Hong, Yaxin Peng, Xiaofeng Zhang, Yangchun Zhang

15 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX

Keywords: Successor features, Distributional reinforcement learning, Underestimation bias

Abstract: The framework of successor features (SFs) and generalized policy improvement (GPI) yields the potential to achieve zero-shot transfer in reinforcement learning (RL) among different tasks. However, GPI always suffers from inaccurate value function approximation in practice, resulting in a ``zero-shot'' somewhat fantastical. This paper focuses on comprehending the underlying causes of inaccurate SFs and presents a methodology for improving their accuracy. Our contributions encompass four key aspects: (i) we theoretically study the underestimation phenomenon in SF\&GPI; (ii) we introduce distributional RL into SF\&GPI, and demonstrate its effectiveness in relieving such underestimation; (iii) we show that distributional SFs (DSFs) is provided with a lower generalization bound than original SFs; (iv) we put forward that the performance of SFs-based algorithms can be enhanced by incorporating DSFs. Furthermore, we verify the quality of employing DSFs on the platform of multi-objective RL (MORL). Simulation study demonstrates the superiority of our concept in addressing underestimation challenges.

Supplementary Material: zip

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 79

Loading