Keywords: Multimodal Learning, Tumor Detection, Outcome Prediction
TL;DR: This paper introduces a new deep learning framework to predict outcomes from histopathology images (WSIs) and clinical data. It outperforms existing methods (MIL) in predicting biopsy Gleason Grade, metastasis, and BRCA2 mutation.
Abstract: Accurate outcome prediction is paramount in histopathology for effective cancer management. We present a novel, high-performance multimodal deep learning framework that efficiently integrates information from whole slide images (WSIs) and, optionally, clinical data to significantly enhance prediction. The first stage achieves precise tumor detection using a custom UNet (ConvNeXtv2 encoder for robust segmentation; decoder with residual connections, bottleneck, and SE blocks). To optimize training and generalization, we introduce a strategic patch selection method that enhances generalization. The second stage efficiently extracts highly informative and compressed feature representations from selected regions using a ResNeXt50 network, pre-trained with DINO. The third stage aggregates these features, combines them with clinical parameters (if available), and predicts outcomes via ResNet18. Critically, the framework leverages a multimodal approach, combining WSI image features with clinical parameters for robust outcome prediction. The framework's efficacy is rigorously demonstrated through experiments on metastasis prediction (prostate cancer WSIs) and BRCA2 mutation prediction (multiple sites). Comparative evaluation against Multiple Instance Learning (MIL) approaches highlights superior performance and effective multimodal data utilization.
Submission Number: 52
Loading