Synthesizing Policies That Account For Human Execution Errors Caused By State Aliasing In Markov Decision Processes

Sriram Gopalakrishnan; Mudit Verma; Subbarao Kambhampati

Synthesizing Policies That Account For Human Execution Errors Caused By State Aliasing In Markov Decision Processes

Sriram Gopalakrishnan, Mudit Verma, Subbarao Kambhampati

Published: 19 Jul 2021, Last Modified: 05 May 2023XAIP 2021Readers: Everyone

Keywords: State-Aliasing, Human Errors, Markov Decision Processes

TL;DR: Synthesizing Policies That Account For Human Execution Errors Caused By State Aliasing In Markov Decision Processes

Abstract: When humans are given a policy to execute, we expect there to be erroneous executions and delays due to possible confusions in identifying a state. So if an algorithm were to compute a policy for a human to execute, it ought to consider these in its decision. An optimal policy that is poorly executed maybe much worse than a suboptimal policy that is executed faithfully and faster. In this paper, we consider these problems of delays and erroneous execution when computing policies for humans that would act in a domain modeled by a Markov Decision Process (MDP). We present an algorithm to search for such policies and show experimental results in a Warehouse Worker domain and Gridworld domain. We also present human studies to show how our assumptions translate to real-world behavior.

5 Replies

Loading