AgentRx: A Benchmark for Multimodal Clinical Forecasting with LLM Agents

Baraa Al Jorf; Farah E. Shamout

AgentRx: A Benchmark for Multimodal Clinical Forecasting with LLM Agents

Baraa Al Jorf, Farah E. Shamout

Published: 11 Jun 2026, Last Modified: 11 Jun 2026Forecast@ICML26 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Agentic AI, Machine Learning for Healthcare, Clinical Forecasting

TL;DR: Our benchmark study reveals that for multimodal clinical forecasting, a single unified agent consistently outperforms and is better calibrated than collaborative multi-agent systems.

Abstract: Effective clinical forecasting requires integrating heterogeneous multimodal data, including electronic health records, images, and clinical notes. While Large Language Model (LLM) agents present a promising solution to mitigate healthcare data fragmentation, their effectiveness in multimodal clinical risk forecasting remains largely unexplored. To address this, we introduce AgentRx, a systematic benchmark evaluating single and multi-agent LLM frameworks across unimodal and multimodal clinical prediction tasks using real-world data. Our findings highlight that single agent frameworks outperform naive multi-agent systems, are better at handling multimodal data, and are better calibrated.

Submission Number: 130

Loading