Best of Both Worlds: Harmonizing LLM Capabilities in Decision-Making and Question-Answering for Treatment Regimes

Hongxuan Liu; Zhiyao Luo; Tingting Zhu

Best of Both Worlds: Harmonizing LLM Capabilities in Decision-Making and Question-Answering for Treatment Regimes

Hongxuan Liu, Zhiyao Luo, Tingting Zhu

Published: 12 Oct 2024, Last Modified: 15 Dec 2024AIM-FM Workshop @ NeurIPS'24 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Dynamic Treatment Regime, Diabetes, Reinforcement Learning, Large Language Model, RLHF

TL;DR: This paper introduces a novel framework incorporating large language models (LLMs) fine-tuning with reinforcement learning (RL) in dynamic treatment regime (DTR) to improve LLM in both medical language tasks and DTR tasks.

Abstract: This paper introduces a framework that incorporates fine-tuning large language models (LLM) with reinforcement learning (RL) in the application of the dynamic treatment regime (DTR). Within the RL training framework, our bilevel-LLM framework makes use of indications from the DTR environment for `RL with Environment Feedback' (RLEF) fine-tuning to achieve best-of-both-world results. Experimental results show that LLM-RLEF agent outperforms both existing RL policies and pure LLM policies on the \emph{SimGlucoseEnv} treatment regime task, improving sampling efficiency, generalizability, and interpretability. In addition to improving DTR performance, RLEF improves LLM's question-answering ability on the MMLU-Med, MedQA, and MedMCQA benchmarks.

Supplementary Material: zip

Submission Number: 85

Loading