Benchmarking Multi-Modal Cardiological Diagnostics within the LLM-as-Agent Paradigm

ACL ARR 2025 February Submission5076 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large language models (LLMs) have revolutionized cardiological diagnostics through agentic design. However, a significant challenge remains: the misalignment between real-world clinical reports used in hospitals and the publicly available datasets used to fine-tune LLMs. This discrepancy limits the reliability of LLMs in cardiological practices. In this work, we address this gap from two key perspectives. First, we introduce Z-Bench, a benchmark derived from in-hospital cardiological reports, where patient records comprise multimodal electrocardiograms (ECGs) enriched with cardiological metrics. Second, we propose Zodiac, an LLM-powered agentic framework designed to enhance cardiological diagnostics. Zodiac operates by systematically extracting clinically relevant characteristics, detecting significant arrhythmias, and generating preliminary diagnostic reports, which are then reviewed and refined by cardiologists. Experimental results demonstrate that Zodiac surpasses industry-leading LLMs from OpenAI, Meta, Google, and DeepSeek, as well as medical-specialist models such as Microsoft’s BioGPT. Our findings highlight the transformative potential of specialized LLMs in healthcare, showcasing their ability to deliver medical solutions that meet the rigorous demands of cardiological guidelines.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: multimodal applications,healthcare applications,clinical NLP
Contribution Types: Publicly available software and/or pre-trained models
Languages Studied: English
Submission Number: 5076
Loading