Is Medical Pretraining Enough When the Modality Is Different? A Study on Endoscopic Polyp Segmentation
Keywords: Transfer learning, Pretraining, Polyp segmentation, ViT, ResNet
TL;DR: Natural and medical image pretraining for endoscopic polyp segmentation using ResNet50 and ViT-Small as encoders with DeepLabV3+
Abstract: Using pretrained models for fine-tuning is a widely adopted strategy in medical imaging, where labeled data is scarce. While ImageNet remains the standard for pretraining in computer vision, RadImageNet, a radiology-specific alternative, has shown promise in radiology realted vision tasks. However, its effectiveness in non-radiological modalities like endoscopy remains unclear. In this study, we conduct a focused evaluation of how transfer learning from natural or medical images affects performance in endoscopic polyp segmentation, using ImageNet, RadImageNet, and a histopathology dataset for pretraining. Two backbone architectures—ResNet-50 and ViT-Small are integrated with a DeepLabV3+ decoder and evaluated on three public datasets: CVC-ClinicDB, Kvasir-SEG, and SUN-SEG. ImageNet-pretrained models consistently outperform those pretrained on medical datasets. These results highlight that medical-domain pretraining is not universally beneficial and emphasize the need for modality alignment when selecting pretrained models for medical imaging tasks.
Submission Number: 95
Loading