Attention De-sparsification Matters: Inducing Diversity in Digital Pathology Representation LearningDownload PDF

Published: 01 Feb 2023, Last Modified: 13 Feb 2023Submitted to ICLR 2023Readers: Everyone
Keywords: Computational pathology, Cell segmentation, Self supervised learning, Vision Transformer, Sparse attention
TL;DR: We introduce Di-SSL, a diversity-inducing self-supervised learning method to enhance the representation learning in Digital Pathology.
Abstract: In this work, we develop Di-SSL, a Diversity-inducing Self-Supervised Learning technique for histopathology image analysis. SSL techniques, such as contrastive and non-contrastive approaches, have been shown to learn rich and effective rep- resentations without any human supervision. Lately, computational pathology has also benefited from the resounding success of SSL. In this work, we develop a novel domain-aware pretext task to enhance representation learning in digital pathology. Our analysis of vanilla SSL-pretrained models’ attention distribution reveals an insightful observation: sparsity in attention, i.e, models tends to localize most of their attention to some prominent patterns in the image. Although atten- tion sparsity can be beneficial in natural images due to these prominent patterns being the object of interest itself, this can be sub-optimal in digital pathology; this is because, unlike natural images, digital pathology scans are not object-centric, but rather a complex phenotype of various spatially intermixed biological com- ponents. Inadequate diversification of attention in these complex images could result in crucial information loss. To address this, we first leverage cell segmenta- tion to densely extract multiple histopathology-specific representations. We then propose a dense pretext task for SSL, designed to match the multiple correspond- ing representations between the views. Through this, the model learns to attend to various components more closely and evenly, thus inducing adequate diversi- fication in attention for capturing context rich representations. Through quantita- tive and qualitative analysis on multiple slide-level tasks across cancer types, and patch-level classification tasks, we demonstrate the efficacy of our method and observe that the attention is more globally distributed. Specifically, we obtain a relative improvement in accuracy of up to 6.9% in slide-level and 2% in patch level classification tasks (corresponding AUC improvement up to 7.9% and 0.7%, respectively) over a baseline SSL model.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Machine Learning for Sciences (eg biology, physics, health sciences, social sciences, climate/sustainability )
22 Replies