One VLM, Two Roles: Stage-Wise Routing and Specialty-Level Deployment for Clinical Workflows

Shayan Vassef; Soorya Ram Shimgekar; Abhay Goyal; Christian Poellabauer; Koustuv Saha; Pi Zonooz; Navin Kumar

One VLM, Two Roles: Stage-Wise Routing and Specialty-Level Deployment for Clinical Workflows

Shayan Vassef, Soorya Ram Shimgekar, Abhay Goyal, Christian Poellabauer, Koustuv Saha, Pi Zonooz, Navin Kumar

Published: 27 Nov 2025, Last Modified: 09 Dec 2025ML4H 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Healthcare AI, Vision--Language Models, MedGemma, Model Cards, MLOps, Early termination, Confidence thresholds, Selective prediction

Track: Proceedings

Abstract: Clinical ML workflows are often fragmented and inefficient: triage, task selection, and model deployment are handled by a patchwork of task-specific networks. These pipelines are rarely aligned with data-science practice, reducing efficiency and increasing operational cost. They also lack data-driven model identification (from imaging/tabular inputs) and standardized delivery of model outputs. We present a framework that employs a single vision-language model (VLM) in two complementary, modular roles. First (Solution 1): the VLM acts as an aware model-card matcher that routes an incoming image to the appropriate specialist model via a three-stage workflow (modality -> primary abnormality -> model-card ID). Reliability is improved by (i) stage-wise prompts enabling early termination via "None"/"Other" and (ii) a calibrated top-2 answer selector with a stage-wise cutoff. This raises routing accuracy by +9 and +11 percentage points on the training and held-out splits, respectively, compared with a baseline router, and improves held-out calibration (lower Expected Calibration Error, ECE). Second (Solution 2): we fine-tune the same VLM on specialty-specific datasets so that one model per specialty covers multiple downstream tasks, simplifying deployment while maintaining performance. Across gastroenterology, hematology, ophthalmology, pathology, and radiology, this single-model deployment matches or approaches specialized baselines. Together, these solutions reduce data-science effort through more accurate selection, simplify monitoring and maintenance by consolidating task-specific models, and increase transparency via per-stage justifications and calibrated thresholds. Each solution stands alone, and in combination they offer a practical, modular path from triage to deployment.

General Area: Applications and Practice

Specific Subject Areas: Deployment, Medical Imaging, Explainability & Interpretability, Foundation Models

Data And Code Availability: No

Ethics Board Approval: No

Entered Conflicts: I confirm the above

Anonymity: I confirm the above

Submission Number: 26

Loading