Reinforcement Learning for Evidence-Seeking Diagnostic Reasoning with Large Language Models

Shengyi Hua; Kangzhe Hu; Conghui He; Xiaofan Zhang; Shaoting Zhang

Reinforcement Learning for Evidence-Seeking Diagnostic Reasoning with Large Language Models

Shengyi Hua, Kangzhe Hu, Conghui He, Xiaofan Zhang, Shaoting Zhang

18 Sept 2025 (modified: 27 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: reasoning LLM, reinforcement learning with verifiable rewards, computational pathology

TL;DR: We formalize diagnosis as a two-turn evidence-seeking task, use RL with diagnostic rewards to guide LLMs, and introduce RAGES for realistic follow-up data, yielding improved accuracy and plausibility over strong baselines.

Abstract: Recent large language models (LLMs) excel at reasoning but often assume complete information, whereas real-world tasks, such as medical diagnosis, require iterative collections of evidence. Existing research rarely reflects this process, treating diagnosis as a one-turn task. This work explicitly formalizes this as a two-turn diagnostic paradigm and proposes reinforcement learning with diagnostic evidence-seeking rewards to guide LLMs in requesting and using evidence. We further introduce Retrieval-Augmented Generation-based Examination Simulation (RAGES), which generates realistic and plausible follow-up evidence to facilitate the process. Experiments on multilingual datasets show that (1) LLMs significantly improve diagnostic accuracy with additional evidence, (2) our model outperforms or matches larger and reasoning-enhanced baselines, and (3) RAGES generates more plausible results than pure LLM generation.

Primary Area: reinforcement learning

Submission Number: 11401

Loading