DiCoH: Rethinking Self-Supervised Pretraining for Semantic Segmentation in Homogenous Medical Domains

Kimathi Kaai; Mahip Singh; Joshua Kurien; C Thomas; Raviteja Vemulapalli; Kwei-Herng Lai; Alexander Wong; Sirisha Rambhatla

DiCoH: Rethinking Self-Supervised Pretraining for Semantic Segmentation in Homogenous Medical Domains

Kimathi Kaai, Mahip Singh, Joshua Kurien, C Thomas, Raviteja Vemulapalli, Kwei-Herng Lai, Alexander Wong, Sirisha Rambhatla

17 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Semantic segmentation, self-supervised learning, pre-training

TL;DR: Self-supervised pre-training geared for homogenous data

Abstract: Self-supervised learning (SSL) for pretraining has become critical for improving segmentation performance when labeled data is scarce. However, existing contrastive methods are primarily designed for diverse, object-centric natural images and struggle to generalize to $\textit{homogenous}$ medical datasets that exhibit low semantic variation across both images and pixels. Low semantic variations make aligning positive pixel-to-pixel pairs trivial and make identifying true negative pairs extremely challenging. Additionally, we identify that $\textit{architectural asymmetry}$, demonstrated to stabilize contrastive pretraining, is detrimental when applied to homogeneous data. To tackle these limitations, we present $\textbf{Di}$verse $\textbf{Co}$ntrastive Learning for $\textbf{H}$omogeneous Data (DiCoH), an SSL pretraining framework for homogeneous medical data. DiCoH improves representation learning by diversifying positive pixel-to-pixel alignments and guaranteeing true negative pairs through a novel \textit{hard} pixel-to-image selection strategy. Comprehensive evaluations on five medical segmentation datasets demonstrate that DiCoH significantly and consistently outranks state-of-the-art SSL methods, achieving +2.00\% mIoU gains under extremely low-data conditions.

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 9631

Loading