Policy Gradient for Reinforcement Learning with General Utilities

Published: 19 Mar 2024, Last Modified: 30 Apr 2024Tiny Papers @ ICLR 2024 PresentEveryoneRevisionsBibTeXCC BY 4.0
Keywords: MDP, Convex MDP, Policy Gradient, RL
TL;DR: We present efficient and easily implemenatble policy gradient for reinforcement learning with objective which are non-linear function of occupation measure.
Abstract: We derive policy gradient theorem for reinforcement learning (RL) with the objective which is a general (non-linear and non-convex) function of the occupancy measure of the policy. This setting incorporates many problems in literature such as apprenticeship learning, pure exploration and variational intrinsic control, etc. Our proposed policy gradient theorem shares the same elegance and ease of implementability as the standard policy gradient theorem, can be generalized easily to model-free settings suitable for large scale problems.
Submission Number: 3
Loading