MedVerse: Efficient and Reliable Medical Reasoning via DAG-Structured Parallel Execution

ACL ARR 2026 January Submission5917 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Medical LLM, LLM Reasoning, Parallel Generation
Abstract: Large language models (LLMs) have demonstrated strong performance and rapid progress in a wide range of medical reasoning tasks. However, their sequential autoregressive decoding forces inherently parallel clinical reasoning, such as differential diagnosis, into a single linear reasoning path, limiting both efficiency and reliability for complex medical problems. To address this, we propose MedVerse, a reasoning framework for complex medical inference that reformulates medical reasoning as a parallelizable directed acyclic graph (DAG) process based on Petri Net theory. The framework adopts a full-stack design across data, model architecture, and system execution. For data creation, we introduce the MedVerse Curator, an automated pipeline that synthesizes knowledge-grounded medical reasoning path and transforms them into Petri Net–structured representations. At the architectural level, we propose a topology-aware attention mechanism with adaptive position indices that supports parallel reasoning while preserving logical consistency. Systematically, we develop a customized inference engine that supports memory-efficient parallel execution. Empirical evaluations show that MedVerse improves strong general-purpose LLMs by up to 8.9\%. Compared to specialized medical LLMs, MedVerse achieves comparable performance while delivering a 1.3$\times$ reduction in inference latency and a 1.7$\times$ increase in generation throughput, enabled by its parallel decoding capability.
Paper Type: Long
Research Area: Clinical and Biomedical Applications
Research Area Keywords: Clinical and Biomedical Applications, Generation, Language Modeling, NLP Applications, Question Answering
Contribution Types: NLP engineering experiment, Approaches low compute settings-efficiency
Languages Studied: English
Submission Number: 5917
Loading