- Keywords: Self-supervised learning, clustering, histopathology, recurrence, lung squamous cell carcinoma
- TL;DR: We build an interpretable survival model to predict recurrence in LSCC based on self-supervised and clustering algorithms. It outperforms current approaches and facilitates the explainability of the histopathological risk factors for LSCC recurrence.
- Abstract: Lung squamous cell carcinoma (LSCC) has a high recurrence and metastasis rate. Factors influencing recurrence and metastasis are currently unknown and there are no distinct histopathological or morphological features indicating the risks of recurrence and metastasis in LSCC. Our study focuses on the recurrence prediction of LSCC based on H&E-stained histopathological whole-slide images (WSI). Due to the small size of LSCC cohorts in terms of patients with available recurrence information, standard end-to-end learning with various convolutional neural networks for this task tends to overfit. Also, the predictions made by these models are hard to interpret. Histopathology WSIs are typically very large and are therefore processed as a set of smaller tiles. In this work, we propose a novel conditional self-supervised learning (SSL) method to learn representations of WSI at the tile level first, and leverage clustering algorithms to identify the tiles with similar histopathological representations. The resulting representations and clusters from self-supervision are used as features of a survival model for recurrence prediction at the patient level. Using two publicly available datasets from TCGA and CPTAC, we show that our LSCC recurrence prediction survival model outperforms both LSCC pathological stage-based approach and machine learning baselines such as multiple instance learning. The proposed method also enables us to explain the recurrence histopathological risk factors via the derived clusters. This can help pathologists derive new hypotheses regarding morphological features associated with LSCC recurrence.
- Registration: I acknowledge that publication of this at MIDL and in the proceedings requires at least one of the authors to register and present the work during the conference.
- Authorship: I confirm that I am the author of this work and that it has not been submitted to another publication before.
- Paper Type: both
- Primary Subject Area: Application: Histopathology
- Secondary Subject Area: Unsupervised Learning and Representation Learning
- Confidentiality And Author Instructions: I read the call for papers and author instructions. I acknowledge that exceeding the page limit and/or altering the latex template can result in desk rejection.
- Code And Data: https://github.com/NYUMedML/conditional_ssl_hist