Policy Gradient Optimization for Markov Decision Processes with Epistemic Uncertainty and General Loss Functions
Keywords: policy gradient, Markov Decision Process, epistemic uncertainty, Bayesian approach, general loss function, convex RL
Abstract: Motivated by many application problems, we consider Markov decision processes (MDPs) with a general loss function and unknown parameters. To mitigate the epistemic uncertainty associated with unknown parameters, we take a Bayesian approach to estimate the parameters from data and impose a coherent risk functional (with respect to the Bayesian posterior distribution) on the general loss function. Since this formulation usually does not satisfy the interchangeability principle, it does not admit Bellman equations and cannot be solved by approaches based on dynamic programming. Therefore, we develop a policy gradient optimization approach to address this problem. We utilize the dual representation of the coherent risk measure and extend the envelope theorem to derive the policy gradient. Our extension of the envelope theorem
from the discrete case to the continuous case may be of independent interest. We then show the convergence of the proposed algorithm with a convergence rate of $\mathcal{O}((1-\epsilon)^t)$, where $t$ is the number of policy gradient iterations and $\epsilon$ is the accuracy. We further extend our algorithm to an episodic setting, and establish the consistency of the extended algorithm and provide bounds on the number of iterations needed to achieve an error bound $\mathcal{O}(\epsilon)$ in each episode.
Supplementary Material: zip
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4950
Loading