Recommending Actions to Improve Engagement for Diabetes Management using Off-Policy Learning
- Keywords: off-policy evaluation, off-policy learning, contextual multi-armed bandit, recommendation, digital health care
- Abstract: Managing diabetes is a complex task which requires taking multiple timely actions, such as checking blood glucose, planning and maintaining diet, taking medications, talking to specialists at the appropriate time. Regular engagement with diabetes programs has been shown to improve patients’ knowledge, actions and outcomes. Digital health systems increasingly use mobile notifications to provide education, reminders and positive reinforcements. In this work, we formulate the personalization and recommendation of these types of health nudges as a Contextual Multi-Armed Bandit problem. Specifically, we apply off-policy policy evaluation (OPE) and learning (OPL) to optimize engagement through personalized recommendations. We evaluate and compare five OPE algorithms, including Inverse Propensity Scoring (IPS), Self-Normalized Inverse Propensity Scoring (SN-IPS), Direct Method (DM), Doubly Robust (DR), and SWITCH Doubly Robust (SWITCH-DR) on data from a large digital health platform. The best performers are SWITCH-DR and DR. We also built an off-policy learning (OPL) pipeline based on BanditNet with a Doubly Robust objective function. We observe a $35%$ increase in click-through-rate (CTR) prediction of the trained policy compared to a matrix factorization policy. Such increases can help patients take more healthy actions which improve health outcomes.