[Re] Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction

Seungwon Kim

[Re] Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction

Seungwon Kim

02 Dec 2019 (modified: 05 May 2023)NeurIPS 2019 Reproducibility Challenge Blind ReportReaders: Everyone

Abstract: In this paper, we reproduce the main results of the paper, Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction, including the performance of baseline algorithms as well as BEAR-QL. We analyze and compare our results with those in the paper, empirically show that BEAR-QL can learn from the random dataset, achieve optimal or suboptimal performance with optimal or medium quality datasets in different continuous control tasks, and provide practical suggestions to reproduce the results of the paper.

Track: Replicability

NeurIPS Paper Id: https://openreview.net/forum?id=H1xutHSxLS&noteId=S1gUDPe0tr

5 Replies

Loading