An Examination of Preference-based Reinforcement Learning for Treatment Recommendation

Nan Xu; Nitin Kamra; Yan Liu

An Examination of Preference-based Reinforcement Learning for Treatment Recommendation

Nan Xu, Nitin Kamra, Yan Liu

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: Preference-based Reinforcement Learning, Treatment Recommendation, healthcare

Abstract: Treatment recommendation is a complex multi-faceted problem with many conflicting objectives, e.g., optimizing the survival rate (or expected lifetime), mitigating negative impacts, reducing financial expenses and time costs, avoiding over-treatment, etc. While this complicates the hand-engineering of a reward function for learning treatment policies, fortunately, qualitative feedback from human experts is readily available and can be easily exploited. Since direct estimation of rewards via inverse reinforcement learning is a challenging task and requires the existence of an optimal human policy, the field of treatment recommendation has recently witnessed the development of the preference-based ReinforcementLearning (PRL) framework, which infers a reward function from only qualitative and imperfect human feedback to ensure that a human expert’s preferred policy has a higher expected return over a less preferred policy. In this paper, we first present an open simulation platform to model the progression of two diseases, namely Cancer and Sepsis, and the reactions of the affected individuals to the received treatment. Secondly, we investigate important problems in adopting preference-basedRL approaches for treatment recommendation, such as advantages of learning from preference over hand-engineered reward, addressing incomparable policies, reward interpretability, and agent design via simulated experiments. The designed simulation platform and insights obtained for preference-based RL approaches are beneficial for achieving the right trade-off between various human objectives during treatment recommendation.

One-sentence Summary: Develop a simulation platform and investigate preference-based reinforcement learning approaches for treatment recommendation

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Supplementary Material: zip

Reviewed Version (pdf): https://openreview.net/references/pdf?id=pMKUh7w7PE

13 Replies

Loading