Approximation Algorithms for Observer Aware MDPs

Shuwa Miura; Olivier Buffet; Shlomo Zilberstein

Approximation Algorithms for Observer Aware MDPs

Shuwa Miura, Olivier Buffet, Shlomo Zilberstein

Published: 26 Apr 2024, Last Modified: 15 Jul 2024UAI 2024 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: MDP, OADMP, legibility, theory of mind, HRI

TL;DR: We present approximation algorithms for Observer-Aware Markov Decision Processes (OAMDPs), MDPs where beliefs depend on the belief of the observer..

Abstract: We present approximation algorithms for Observer-Aware Markov Decision Processes (OAMDPs). OAMDPs model sequential decision-making problems in which rewards depend on the beliefs of an observer about the goals, intentions, or capabilities of the observed agent. The first proposed algorithm is a grid-based value iteration (Grid-VI), which discretizes the observer's belief into regular grids. Based on the same discretization, the second proposed algorithm is a variant of Real-Time Dynamic Programming (RTDP) called Grid-RTDP. Unlike Grid-Vi, Grid-RTDP focuses its updates on promising states using heuristic estimates. We provide theoretical guarantees of the proposed algorithms and demonstrate that Grid-RTDP has a good anytime performance comparable to the existing approach without performance guarantees.

List Of Authors: Miura, Shuwa and Olivier Buffet and Shlomo Zilberstein

Latex Source Code: zip

Signed License Agreement: pdf

Submission Number: 302

Loading