Rank-1 Approximation of Inverse Fisher for Natural Policy Gradients in Deep Reinforcement Learning

Rank-1 Approximation of Inverse Fisher for Natural Policy Gradients in Deep Reinforcement Learning

TMLR Paper5645 Authors

15 Aug 2025 (modified: 21 Nov 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Natural gradients have been long studied in deep reinforcement learning due to its fast convergence properties and covariant weight updates. However, computing natural gradients requires inversion of Fisher Information Matrix (FIM) at each iteration, which is computationally prohibitive in nature. In this paper, we present an efficient and scalable natural policy optimization technique which leverages a rank-1 approximation to full inverse-FIM. We theoretically show that under certain conditions, rank-1 approximation to inverse-FIM converges faster than policy gradients and under some condition, enjoys the same sample complexity as stochastic policy gradient methods. We benchmark our method on a diverse set of environments and show that our methods achieve superior performance than standard trust-region and actor-critic baselines.

Submission Length: Long submission (more than 12 pages of main content)

Assigned Action Editor: ~Alp_Kucukelbir1

Submission Number: 5645

Loading