A Survey on LLM-based Multi-Agent AI Hospital

ACL ARR 2026 January Submission990 Authors

26 Dec 2025 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: AI hospital, Multi-agent LLMs, Tool-augmented LLMs, LLM-as-a-Judge, Safety alignment & guardrails, Simulation
Abstract: AI hospitals are workflow-level multi-agent systems built on large language models that run inside clinical processes. Agents take explicit roles, maintain shared state through handoffs, use EHR- and guideline-grounded tools, and operate under safety gateways with audit logs. Prior work is rich but fragmented across tasks and settings. This survey defines the scope and boundaries of AI hospitals and compiles designs into a compact taxonomy with head-to-head trade-off matrices. We introduce a layered evaluation stack that measures safety, clinical processes, outcomes, and operations (e.g., time-to-disposition, throughput, and token/latency costs), and we use Integration Readiness Levels (IRL1--IRL6) to gate autonomy from sandbox to deployment, with required logs and pass criteria. To make deployment claims testable, we map key integration tasks to minimal instrumentation and formulate several challenges as workflow-failure mechanisms with concrete tests and IRL gates. We close with a practical roadmap on workflow-aware memory, queue-aware planning, escalation learning, traceability, and playbook adoption.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: Interdisciplinary NLP, Multi-Agent Systems, Simulation-Based Evaluation, NLP for Healthcare
Contribution Types: Surveys
Languages Studied: English
Submission Number: 990
Loading