Oblivious Markov Decision Processes: Planning and Policy Execution

Murtadha Alsayegh, Jose Fuentes, Leonardo Bobadilla, Dylan A. Shell

Published: 01 Dec 2023, Last Modified: 31 Jan 2025CDCEveryoneCC BY 4.0

Abstract: We examine a novel setting in which two parties have partial knowledge of the elements that make up a Markov Decision Process (MDP) and must cooperate to compute and execute an optimal policy for the problem constructed from those elements. This situation arises when one party wants to give a robot some task, but does not wish to divulge those details to a second party—while the second party possesses sensitive data about the robot’s dynamics (information needed for planning). Both parties want the robot to perform the task successfully, but neither is willing to disclose any more information than is absolutely necessary. We utilize techniques from secure multi-party computation, combining primitives and algorithms to construct protocols that can compute an optimal policy while ensuring that the policy remains opaque by being split across both parties. To execute a split policy, we also give a protocol that enables the robot to determine what actions to trigger, while the second party guards against attempts to probe for information inconsistent with the policy’s prescribed execution. In order to improve scalability, we find that basis functions and constraint sampling methods are useful in forming effective approximate MDPs. We report simulation results examining performance and precision, and assess the scaling properties of our Python implementation. We also describe a hardware proof-of-feasibility implementation using inexpensive physical robots, which, being a small-scale instance, can be solved directly.