The Point to Which Soft Actor-Critic ConvergesDownload PDF

01 Mar 2023 (modified: 30 May 2023)Submitted to Tiny Papers @ ICLR 2023Readers: Everyone
Keywords: maximum entropy, soft policy iteration
TL;DR: We prove that SAC converges to the same point of SQL in the limit.
Abstract: Soft actor-critic is a successful successor over soft Q-learning. While lived under maximum entropy framework, their relationship is still unclear. In this paper, we prove that in the limit they converge to the same solution. This is appealing since it translates the optimization from an arduous to an easier way. The same justification can also be applied to other regularizers such as KL divergence.
5 Replies