Percentile Criterion Optimization in Offline Reinforcement Learning

Cyrus Cousins; Elita Lobo; Marek Petrik; Yair Zick

Percentile Criterion Optimization in Offline Reinforcement Learning

Cyrus Cousins, Elita Lobo, Marek Petrik, Yair Zick

Published: 21 Sept 2023, Last Modified: 16 Jan 2024NeurIPS 2023 posterEveryoneRevisionsBibTeX

Keywords: Reinforcement Learning, Bayesian Uncertainty, Robustness

Abstract: In reinforcement learning, robust policies for high-stakes decision-making problems with limited data are usually computed by optimizing the percentile criterion. The percentile criterion is optimized by constructing an uncertainty set that contains the true model with high probability and optimizing the policy for the worst model in the set. Since the percentile criterion is non-convex, constructing these sets itself is challenging. Existing works use Bayesian credible regions as uncertainty sets, but they are often unnecessarily large and result in learning overly conservative policies. To overcome these shortcomings, we propose a novel Value-at-Risk based dynamic programming algorithm to optimize the percentile criterion without explicitly constructing any uncertainty sets. Our theoretical and empirical results show that our algorithm implicitly constructs much smaller uncertainty sets and learns less-conservative robust policies.

Supplementary Material: zip

Submission Number: 5368

Loading