Abstract: We examine a novel setting in which two parties
have partial knowledge of the elements that make up a Markov
Decision Process (MDP) and must cooperate to compute and
execute an optimal policy for the problem constructed from
those elements. This situation arises when one party wants to
give a robot some task, but does not wish to divulge those
details to a second party—while the second party possesses
sensitive data about the robot’s dynamics (information needed
for planning). Both parties want the robot to perform the
task successfully, but neither is willing to disclose any more
information than is absolutely necessary. We utilize techniques
from secure multi-party computation, combining primitives
and algorithms to construct protocols that can compute an
optimal policy while ensuring that the policy remains opaque
by being split across both parties. To execute a split policy,
we also give a protocol that enables the robot to determine
what actions to trigger, while the second party guards against
attempts to probe for information inconsistent with the policy’s
prescribed execution. In order to improve scalability, we find
that basis functions and constraint sampling methods are useful
in forming effective approximate MDPs. We report simulation
results examining performance and precision, and assess the
scaling properties of our Python implementation. We also
describe a hardware proof-of-feasibility implementation using
inexpensive physical robots, which, being a small-scale instance,
can be solved directly.
Loading