Keywords: Decision and Control, Missing Data, Model-Based RL, Planning, Reinforcement Learning
TL;DR: We introduce a framework that integrates the theory of missing data into POMDP planning and propose algorithms for learning observation functions under different missingness processes.
Abstract: We introduce *missingness-MDPs* (miss-MDPs); a subclass of partially observable Markov decision processes (POMDPs) that incorporates the theory of missing data.Miss-MDPs capture settings where, at each step, the current state may go partially missing, that is, the state is not observed.
Missingness of observations occurs dynamically and is caused by a *missingness function*, which governs the underlying probabilistic missingness process.
Miss-MDPs distinguish the three types of missingness processes as a restriction on the missingness function: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR).
Our goal is to compute a policy for a miss-MDP with an *unknown missingness function*.
We propose algorithms that, by using a retrospective dataset and based on the different types of missingness processes,
approximate the missingness function and, thereby, the true miss-MDP.
The algorithms can approximate a subset of MAR and MNAR missingness functions, and
we show that, for these, the optimal policy in the approximated model is $\varepsilon$-optimal in the true miss-MDP.
The empirical evaluation confirms these findings.
Additionally, it shows that our approach becomes more sample-efficient when exploiting the type of the underlying missingness process.
Confirmation: I understand that authors of each paper submitted to EWRL may be asked to review 2-3 other submissions to EWRL.
Serve As Reviewer: ~Joshua_Wendland2
Track: Regular Track: unpublished work
Submission Number: 128
Loading