Inverse Reinforcement Learning for Restless Multi-Armed Bandits with Application to Maternal and Child HealthDownload PDF

Anonymous

11 Jun 2023 (modified: 01 Sept 2023)IJCAI 2023 Workshop BridgeAICCHE Blind SubmissionReaders: Everyone
Keywords: Inverse Reinforcement Learning, Restless Multi Armed Bandits, Decision Focused Learning, Public Health
TL;DR: A paper that designs a new inverse reinforcement algorithm that can be applied to the maternal and child health space where number of beneficiaries is very large.
Abstract: We study restless multi-armed bandits (RMABs) in the context of public health, where there is a need to optimize resource allocation decisions. Until now, RMABs typically solve for the optimal planning policy by assuming the reward function in the problem is fully known. However, in this work, we aim to study whether we can learn the most optimal rewards for an RMAB problem given some demonstrated, ideal behavior. To achieve this, we turn to inverse reinforcement learning (IRL) which is a field of study motivated by the desire to understand and learn the underlying reward structure of an agent's observed behavior. Existing IRL approaches predominantly focus on single agent systems, presenting limitations in dealing with the expansive state spaces characteristic of public health scenarios, where tens of thousands of arms are active simultaneously. We propose a new IRL algorithm specifically for RMAB settings that uses techniques from decision focused learning (DFL) to directly optimize the objective function, allowing for efficient and accurate updates to the learned rewards. We compare our algorithm with the max entropy IRL baseline on runtime and accuracy and find that our algorithm performs better on both metrics. We also propose a framework for how to apply this algorithm in the public health domain where expert trajectories come from domain experts.
1 Reply

Loading