Confidence Intervals for the Return Process in Markov Decision Processes

Published: 26 Feb 2026, Last Modified: 07 Mar 2026OpenReview Archive Direct UploadEveryoneCC BY-NC 4.0
Abstract: In this work, we derive confidence intervals for the return process in discounted reward Markov Decision Processes with continuous state and action spaces. These confidence bounds depend only on the statistics of the value function, which may be derived using dynamic programming. In the two special cases of MDPs with uniformly bounded value functions and MDPs with linear structures, simpler confidence intervals are provided for the return process. Finally, we study the effect of epistemic uncertainty on the derived confidence intervals. Numerical examples are provided to show how these bounds may be used in practice.
Loading