Implicit Incremental Natural Actor Critic

Ryo Iwaki, Minoru Asada

Published: 01 Jan 2017, Last Modified: 30 Sept 2024ICONIP (1) 2017EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The natural policy gradient (NPG) method is a promising approach to find a locally optimal policy parameter. The NPG method has been demonstrated remarkable successes in many fields, including the large scale applications. On the other hand, the estimation of the NPG itself requires a enormous amount of samples. Furthermore, incremental estimation of the NPG is computationally unstable. In this work, we propose a new incremental and stable algorthm for the NPG estimation. The proposed algorithm is based on the idea of implicit temporal differences, and we call the proposed one implicit incremental natural actor critic (I2NAC). Theoretical analysis indicates the stability of I2NAC and the instability of conventional incremental NPG methods. Numerical experiment shows that I2NAC is less sensitive to the value of step sizes.