Best of Both Worlds: Harmonizing LLM Capabilities in Decision-Making and Question-Answering for Treatment Regimes
Keywords: Dynamic Treatment Regime, Diabetes, Reinforcement Learning, Large Language Model, RLHF
TL;DR: This paper introduces a novel framework incorporating large language models (LLMs) fine-tuning with reinforcement learning (RL) in dynamic treatment regime (DTR) to improve LLM in both medical language tasks and DTR tasks.
Abstract: This paper introduces a framework that incorporates fine-tuning large language models (LLM) with reinforcement learning (RL) in the application of the dynamic treatment regime (DTR). Within the RL training framework, our bilevel-LLM framework makes use of indications from the DTR environment for `RL with Environment Feedback' (RLEF) fine-tuning to achieve best-of-both-world results. Experimental results show that LLM-RLEF agent outperforms both existing RL policies and pure LLM policies on the \emph{SimGlucoseEnv} treatment regime task, improving sampling efficiency, generalizability, and interpretability. In addition to improving DTR performance, RLEF improves LLM's question-answering ability on the MMLU-Med, MedQA, and MedMCQA benchmarks.
Supplementary Material: zip
Submission Number: 85
Loading