Plan for the Worst With Advice: Advice-Augmented Robust Markov Decision Processes

Published: 28 Nov 2025, Last Modified: 30 Nov 2025NeurIPS 2025 Workshop MLxOREveryoneRevisionsBibTeXCC BY 4.0
Keywords: Reinforcement Learning, Robustness, Advice-augmented
Abstract: We consider the integration of advice into Robust Markov Decision Processes (RMDPs). While the RMDP formulation aids in modeling ambiguity with respect to transition dynamics, it is overly conservative due to its focus on worst-case instances. To move beyond the worst-case framework, we propose an advice-augmented setting in which the decision maker has access to advice in the form of a predicted transition kernel they seek to leverage to obtain better guarantees. The decision maker in this setting cares about finding a policy that performs well for both the worst case and advice transition dynamics. Thus, we define \emph{robustness} and \emph{consistency} as metrics the decision maker optimizes and propose a family of optimization problems whose solutions are Pareto-optimal with respect to robustness and consistency. Under standard assumptions on the ambiguity set, the optimal solutions are deterministic, Markovian, and stationary. Given a set of Pareto-optimal policies, we then provide a policy selection algorithm that achieves max-min optimality across robustness and consistency.
Submission Number: 215
Loading