Abstract: We study the adaption of soft actor-critic (SAC) from continuous action space to discrete action space. We revisit vanilla SAC and provide an in-depth understanding of its Q value underestimation and performance instability issues when applied to discrete settings. We thereby propose entropy-penalty and double average Q-learning with Q-clip to address these issues. Extensive experiments on typical benchmarks with discrete action space, including Atari games and a large-scale MOBA game, show the efficacy of our proposed method. Our code is at: https://github.com/revisiting-sac/Revisiting-Discrete-SAC.git.
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=7bF4oQ5hGr
Changes Since Last Submission: Revision in Response to Reviewer vWe7 and Reviewer NPAN.
Assigned Action Editor: ~Yu_Bai1
Submission Number: 1483
Loading