Is Medical Pretraining Enough When the Modality Is Different? A Study on Endoscopic Polyp Segmentation
Keywords: Transfer learning, Pretraining, Polyp segmentation, ViT, ResNet
TL;DR: We compare ImageNet and RadImageNet pretraining for endoscopic segmentation and find that ImageNet pretrained models consistently performs better, suggesting that visual similarity may be more important than domain-specific pretraining.
Abstract: Using pretrained models for fine-tuning is a widely adopted strategy in medical imaging, where labeled data is scarce. ImageNet remains the standard for pretraining in computer vision tasks, including medical imaging. RadImageNet, a medical-specific alternative trained on radiological data, has shown promising results in radiology-focused applications; however, its effectiveness in non-radiological modalities, such as endoscopy, remains unexplored. In this study, we conduct a focused evaluation of how transfer learning from ImageNet and RadImageNet affects performance in endoscopic segmentation. We compare two backbone architectures---ResNet-50 and ViT-Small---each integrated into a DeepLabV3+ decoder, and evaluate their performance on three public polyp segmentation datasets: CVC-ClinicDB, Kvasir-SEG and SUN-SEG. Our results show that ImageNet-pretrained models consistently outperform those pretrained on RadImageNet. These findings challenge the notion that medical-domain pretraining is universally beneficial and underscore the importance of modality alignment when selecting pretrained models for medical image analysis.
Submission Number: 95
Loading