Strength in Diversity: Understanding the impacts of diverse training sets in self-supervised pre-training for histology images
Keywords: Self-supervised learning, digital histopathology
TL;DR: This paper explores cross-domain self-supervised learning in digital histopathology images.
Abstract: Self-supervised learning (SSL) has demonstrated success in computer vision tasks for natural images, and recently histopathological images, where there is limited availability of annotations. Despite this, there has been limited research into how the diversity of source data used for SSL tasks impacts performance. The current study quantifies changes to downstream classification of metastatic tissue in lymph node sections of the PatchCamelyon dataset when datasets from different domains (natural images, textures, histology) are used for SSL pre-training. We show that for cases with limited training data, using diverse datasets from different domains for SSL pre-training can achieve comparable performance when compared with SSL pre-training on the target dataset.
Paper Type: validation/application paper
Primary Subject Area: Application: Histopathology
Secondary Subject Area: Transfer Learning and Domain Adaptation
Paper Status: original work, not submitted yet
Source Code Url: https://github.com/kristinakupf/Histo_StrengthInDiversity
Data Set Url: Patch Camelyon (PCam) Dataset: https://github.com/basveeling/pcam, Colorectal Cancer (CRC) Dataset: https://zenodo.org/record/53169#.YGz4ymRueWD, ALOT Dataset: http://color.univ-lille.fr/datasets/alot, TinyImageNet Dataset: https://www.kaggle.com/c/tiny-imagenet/data
Registration: I acknowledge that publication of this at MIDL and in the proceedings requires at least one of the authors to register and present the work during the conference.
Authorship: I confirm that I am the author of this work and that it has not been submitted to another publication before.