MedSCAMA: Medical SCale-Aware Multi-Agent Framework for Medical Image Retrieval and Retrieval-Augmented Generation

18 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Medical Image Retrieval, Multi-Scale Representation Learning, Scale-Aware Mixture-of-Experts, Soft Contrastive Learning, Preference Optimization, Adaptive Retrieval, Question Answering, Vision-Language Models
Abstract: Medical image retrieval and retrieval-augmented generation require representations that capture both visual similarity and clinically meaningful semantics across multiple levels of granularity. However, existing image encoders may miss pixel-to-organ-to-context cues, and report encoders may flatten hierarchical findings with global summaries, weakening vision-language alignment and alignment between query and targets in retrieval tasks. To address these gaps, we propose MedSCAMA (Medical SCale-Aware Multi-Agent Framework), a collaborative multi-agent framework designed to enable multi-scale feature modeling and retrieval-augmented reasoning. Specifically, ScaFormer encodes multi-scale visual features, the RU and PC Agents provide hierarchical text and similarity cues, and the QA Agent performs adaptive retrieval for evidence-grounded reasoning. Experiments across multiple medical imaging benchmarks demonstrate that MedSCAMA substantially enhances both retrieval quality and diagnostic reasoning, offering more accurate, interpretable, and clinically relevant results than existing approaches. This multi-scale, multi-agent design provides a principled foundation for integrating vision, language, and reasoning in medical AI systems.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 10436
Loading