Abstract: Robust Markov decision processes (MDPs) provide a practical framework for generalizing trained agents to new environments. There, the objective is to maximize performance under the worst model of a given uncertainty set. By construction, this raises a performance-robustness dilemma: accounting for too large uncertainty yields guarantees against larger disturbances, whilst too small uncertainty may result in over-sensitivity to model misspecification. In this work, we introduce an online method that addresses the conservativeness of robust MDPs by strategically contracting the uncertainty set. First, we explicitly formulate the gradient of the robust return with respect to the uncertainty radius. This gradient derivation enables us to prioritize efforts in reducing uncertainty and leads us to interesting findings on the relation between the robust return and the uncertainty set. Second, we present a sampling-based algorithm aimed at enhancing our uncertainty estimation with respect to the robust return. Third, we illustrate the effectiveness of our algorithm within a tabular environment.
Submission Number: 83
Loading