Meissa: Multi-modal Medical Agentic Intelligence

Yixiong Chen, Xinyi Bai, Yue Pan, Zongwei Zhou, Alan Yuille

Published: 09 Mar 2026, Last Modified: 14 Apr 2026OpenReview Archive Direct UploadEveryonearXiv.org perpetual, non-exclusive license

Abstract: Multi-modal large language models (MM-LLMs) have shown strong performance in medical image understanding and clinical reasoning. Recent medical agent systems extend them with tool use and multiagent collaboration, enabling complex decision-making beyond singlepass inference. However, these systems rely almost entirely on proprietary frontier models (e.g., GPT), whose API-based deployment incurs high cost, high latency, and privacy risks that conflict with on-premise clinical requirements. We present Meissa, a lightweight 4B-parameter medical MM-LLM that brings full agentic capability offline. Instead of imitating static answers, Meissa learns both when to engage external interaction (strategy selection) and how to execute multi-step interaction (strategy execution) by distilling structured trajectories from frontier agent systems. To be specific, we propose: (1) Unified trajectory modeling: trajectories (reasoning and action traces) are represented within a single state–action–observation formalism, allowing one model to generalize across heterogeneous medical environments. (2) Three-tier stratified supervision: the model’s own errors trigger progressive escalation from direct reasoning to tool-augmented and multiagent interaction, explicitly learning difficulty-aware strategy selection. (3) Prospective–retrospective supervision: pairing exploratory forward traces with hindsight-rationalized execution traces enables stable learning of effective interaction policies. Trained on ∼40K curated trajectories, Meissa matches or exceeds proprietary frontier agents in 10 of 16 evaluation settings across 13 medical benchmarks spanning radiology, pathology, and clinical reasoning. Using over 25× fewer parameters than typical frontier models like Gemini-3, Meissa operates fully offline with ∼22× lower end-to-end latency compared to API-based deployment. Data, models, and environments are released at https://github.com/Schuture/Meissa.