DiCoH: Rethinking Self-Supervised Pretraining for Semantic Segmentation in Homogenous Medical Domains
Keywords: Semantic segmentation, self-supervised learning, pre-training
TL;DR: Self-supervised pre-training geared for homogenous data
Abstract: Self-supervised learning (SSL) for pretraining has become critical for improving segmentation performance when labeled data is scarce. However, existing contrastive methods are primarily designed for diverse, object-centric natural images and struggle to generalize to $\textit{homogenous}$ medical datasets that exhibit low semantic variation across both images and pixels. Low semantic variations make aligning positive pixel-to-pixel pairs trivial and make identifying true negative pairs extremely challenging. Additionally, we identify that $\textit{architectural asymmetry}$, demonstrated to stabilize contrastive pretraining, is detrimental when applied to homogeneous data. To tackle these limitations, we present $\textbf{Di}$verse $\textbf{Co}$ntrastive Learning for $\textbf{H}$omogeneous Data (DiCoH), an SSL pretraining framework for homogeneous medical data. DiCoH improves representation learning by diversifying positive pixel-to-pixel alignments and guaranteeing true negative pairs through a novel \textit{hard} pixel-to-image selection strategy. Comprehensive evaluations on five medical segmentation datasets demonstrate that DiCoH significantly and consistently outranks state-of-the-art SSL methods, achieving +2.00\% mIoU gains under extremely low-data conditions.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 9631
Loading