Keywords: reasoning LLM, reinforcement learning with verifiable rewards, computational pathology
TL;DR: We formalize diagnosis as a two-turn evidence-seeking task, use RL with diagnostic rewards to guide LLMs, and introduce RAGES for realistic follow-up data, yielding improved accuracy and plausibility over strong baselines.
Abstract: Recent large language models (LLMs) excel at reasoning but often assume complete information, whereas real-world tasks, such as medical diagnosis, require iterative collections of evidence. Existing research rarely reflects this process, treating diagnosis as a one-turn task. This work explicitly formalizes this as a two-turn diagnostic paradigm and proposes reinforcement learning with diagnostic evidence-seeking rewards to guide LLMs in requesting and using evidence. We further introduce Retrieval-Augmented Generation-based Examination Simulation (RAGES), which generates realistic and plausible follow-up evidence to facilitate the process. Experiments on multilingual datasets show that (1) LLMs significantly improve diagnostic accuracy with additional evidence, (2) our model outperforms or matches larger and reasoning-enhanced baselines, and (3) RAGES generates more plausible results than pure LLM generation.
Primary Area: reinforcement learning
Submission Number: 11401
Loading