Keywords: State-Aliasing, Human Errors, Markov Decision Processes
TL;DR: Synthesizing Policies That Account For Human Execution Errors Caused By State Aliasing In Markov Decision Processes
Abstract: When humans are given a policy to execute, we expect there to be erroneous executions and delays due to possible confusions in identifying a state. So if an algorithm were to compute a policy for a human to execute, it ought to consider these in its decision. An optimal policy that is poorly executed maybe much worse than a suboptimal policy that is executed faithfully and faster. In this paper, we consider these problems of delays and erroneous execution when computing policies for humans that would act in a domain modeled by a Markov Decision Process (MDP). We present an algorithm to search for such policies and show experimental results in a Warehouse Worker domain and Gridworld domain. We also present human studies to show how our assumptions translate to real-world behavior.