T³: Test-Time Model Merging in VLMs for Zero-Shot Medical Imaging Analysis

18 Sept 2025 (modified: 12 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: medical imaging, vision language models, zero-shot generalization, model merging, healthcare
TL;DR: We propose a sample-wise test-time model merging in vision-language models validating enhanced performance across four medical imaging classification tasks on a practical cross-dataset evaluation medical benchmark.
Abstract: In medical imaging, vision-language models face a critical duality: \textit{pretrained} networks offer broad robustness but lack subtle, modality-specific characteristics, while fine-tuned \textit{expert} models achieve high in-distribution accuracy yet falter under modality shift. Existing model-merging techniques, designed for natural-image benchmarks, are simple and efficient but fail to deliver consistent gains across diverse medical modalities; their static interpolation limits reliability in varied clinical tasks. To address this, we introduce \textbf{T}est-\textbf{T}ime \textbf{T}ask adaptive merging ($\mathbb{T^{3}}$), a backpropagation-free framework that computes \textit{per-sample} interpolation coefficients via the Jensen–Shannon divergence between the two models’ output distributions. $\mathbb{T^{3}}$ dynamically preserves local precision when models agree and defers to generalist robustness under drift. To overcome the inference costs of sample-wise merging, we further propose a batch-wise extension, $\mathbb{T^{3}}_{\mathcal{B}}$ that computes merging coefficient across a batch of samples, dramatically reducing computational bottleneck. Recognizing the lack of a standardized medical-merging benchmark, we present a rigorous cross-evaluation protocol spanning in-domain, base-to-novel, and corruptions across four modalities. Empirically, $\mathbb{T^{3}}$ sets new state-of-the-art in Top-1 accuracy and error reduction, outperforming strong baselines while maintaining efficiency, paving the way for adaptive MVLM deployment in clinical settings.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 12886
Loading