Computing Policies That Account for the Effects of Human Uncertainty During Execution in Markov Decision Processes
Keywords: Human-aware AI, Markov Decision Processes
TL;DR: Computing MDP policies that consider the effects of human uncertainty during execution (by a human agent).
Abstract: When humans are given a policy to execute, there can be policy execution errors and deviations in policy if there is uncertainty in identifying a state. This can happen due to the human agent's cognitive limitations and/or perceptual errors. So an algorithm that computes a policy for a human to execute ought to consider these effects in its computations. An optimal Markov Decision Process (MDP) policy that is poorly executed (because of a human agent) maybe much worse than another policy that is suboptimal in the MDP, but considers the human-agent's execution behavior. In this paper we consider two problems that arise from state uncertainty; these are erroneous state-inference, and extra-sensing actions that a person might take as a result of their uncertainty. We present an approach to model the human agent's behavior with respect to state uncertainty, which can then be used to compute MDP policies that accounts for these problems. This is followed by a hill climbing algorithm to search for good policies given our model of the human agent. We also present a branch and bound algorithm which can find the optimal policy for such problems. We show experimental results in a Gridworld domain, and warehouse-worker domain. Finally, we present human-subject studies that support our human model assumptions.