Keywords: Medical Visual Reasoning, Reinforcement Learning, Agentic AI, Sim2Real, Environment Simulation, Privacy-Preserving AI
TL;DR: We operationalize the Sim2Real paradigm for medical AI, successfully training a reasoning agent entirely in a simulated environment that then performs robustly on real-world clinical tasks.
Abstract: Developing autonomous agents for complex Medical Visual Reasoning is a critical goal, yet training them in real-world clinical settings is largely infeasible due to severe privacy, data, and safety constraints. While retrieval-augmented methods exist, they often depend on impractical multimodal indexing or fail to address the core challenge of learning interactive policies without real-world exposure.
To bridge this gap, we introduce MedSimSearch, a novel framework based on Sim2Real Agentic Learning. The core innovation lies in leveraging a generative large multimodal model (LMM) to create a high-fidelity simulated retrieval environment. Within this safe, text-only simulation, our agent learns a robust search and reasoning policy, eliminating the need for multimodal data indexing while preserving patient privacy.
To validate our approach, we evaluate the agent trained in simulation on realistic medical benchmarks using a curated private text corpus. Extensive experiments on VQAMed2019 and OmniMedVQA demonstrate that MedSimSearch significantly surpasses strong retrieval-augmented generation (RAG) baselines and shows enhanced robustness against hallucinations, paving a viable path for deploying trustworthy medical AI agents.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 18525
Loading