Learning-to-defer for sequential medical decision-making under uncertainty
Abstract: Learning-to-defer is a framework to automatically defer decision-making to a human expert when ML-based decisions are deemed unreliable. Existing learning-to-defer frameworks are not designed for sequential settings. That is, they defer at every instance independently, based on immediate predictions, while ignoring the potential long-term impact of these interventions. As a result, existing frameworks are myopic. Further, they do not defer adaptively, which is crucial when human interventions are costly. In this work, we propose Sequential Learning-to-Defer (SLTD), a framework for learning-to-defer to a domain expert in sequential decision-making settings. Contrary to existing literature, we pose the problem of learning-to-defer as model-based reinforcement learning (RL) to i) account for long-term consequences of ML-based actions using RL and ii) adaptively defer based on the dynamics (model-based). Our proposed framework determines whether to defer (at each time step) by quantifying whether a deferral now will improve the value compared to delaying deferral to the next time step. To quantify the improvement, we account for potential future deferrals. As a result, we learn a pre-emptive deferral policy (i.e. a policy that defers early if using the ML-based policy could worsen long-term outcomes). Our deferral policy is adaptive to the non-stationarity in the dynamics. We demonstrate that adaptive deferral via SLTD provides an improved trade-off between long-term outcomes and deferral frequency on synthetic, semi-synthetic, and real-world data with non-stationary dynamics. Finally, we interpret the deferral decision by decomposing the propagated (long-term) uncertainty around the outcome, to justify the deferral decision.
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: We incorporated all reviewer suggestions: 1. Experiments varying the sample size 2. Importance-Sampling as an alternative estimation of learning-to-defer and updated all results in main paper and appendix to reflect analysis based on both estimation methods used for analyzing the LTD methods. 3. We re-run all experiments to make sure of consistency between both estimation methods and also included how model behavior would change depending on the OPE method relied on for model selection. 4. Additional discussion on experiments, with requested clarifications, including additional discussion of all results 5. Expanded literature review to reflect recommendations from reviewers such as including state-of-the-art reviews on OPPE and connections to causal inference.
Supplementary Material: zip
Assigned Action Editor: ~Jessica_Schrouff1
Submission Number: 664