Dialogue is Better Than Monologue: Instructing Medical LLMs via Strategical Conversation

ACL ARR 2025 May Submission4401 Authors

19 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Current medical AI models are trained primarily on static articles and question-answering (QA) tasks, and then evaluated on similar QA benchmarks. However, previous approaches fail to capture the dynamic real-world nature of clinical reasoning, particularly in handling ambiguous inputs (e.g., conflicting symptoms) and multi-step decision-making. To address this, we: \ding{182} introduce a comprehensive diagnostic benchmark, \textbf{MuddyMaze}, evaluating clinical reasoning with controlled noise and USMLE-aligned difficulty levels; \ding{183} curate a new dialogue dataset by converting 10.2k medical QA pairs and 12k PubMed articles into clinician-patient interactions; and \ding{184} develop dialogue-based fine-tuning that enhances reasoning capabilities. Experiments demonstrate significant improvements over traditional methods (+16.10\% in one-round accuracy and +4.06\% in multi-round reasoning), validating that dialogue-based training better aligns AI systems with real clinical workflows.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: Dialogue and Interactive Systems; Healthcare applications; Clinical NLP;
Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources
Languages Studied: english
Submission Number: 4401
Loading