MCMIAD: Multi-Class Model for Medical Image Anomaly Detection

29 Nov 2025 (modified: 15 Dec 2025)MIDL 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Medical anomaly detection, vision–language models, CLIP
TL;DR: MCMIAD is a unified, lightweight CLIP-based framework that enables efficient, prompt-guided, and modality-agnostic medical anomaly detection.
Abstract: Accurate anomaly detection in medical imaging is critical for clinical decision-making, yet many deployed systems still rely on disease-specific models and large labeled datasets. We present \textbf{MCMIAD}, a unified vision--language framework that couples a frozen EfficientNet image encoder and a CLIP text encoder with a shallow cross-modal fusion block and a denoising Transformer decoder. The framework is designed around three goals: \emph{modality-agnostic deployment}, \emph{prompt-guided explainability}, and \emph{practical efficiency}. MCMIAD keeps the vision backbone frozen and trains only a compact reconstruction head, making the method lightweight enough for typical clinical GPUs. On the BMAD benchmark, MCMIAD achieves strong image- and pixel-level AUROC across retina OCT, brain tumor MRI, and liver tumor CT, with particularly notable gains in one-shot settings where only a single normal example per category is available. Its anomaly heatmaps align with expected clinical regions of interest, supporting human-in-the-loop review. We further analyze the contributions of CLIP-guided cross-attention, model size, and we discuss robustness, fairness, and deployment considerations relevant to real-world clinical workflows.
Primary Subject Area: Unsupervised Learning and Representation Learning
Secondary Subject Area: Detection and Diagnosis
Registration Requirement: Yes
Reproducibility: Yes, we will publish the code and intruction guide for reproduce.
Visa & Travel: No
Read CFP & Author Instructions: Yes
Originality Policy: Yes
Single-blind & Not Under Review Elsewhere: Yes
LLM Policy: Yes
Submission Number: 119
Loading