Report-Guided Cross-Modal Representation Learning for Predicting EGFR Mutations by Whole Slide Image

Qi Qiao, Jun Shi, Zhiguo Jiang, Wei Wang, Haibo Wu, Yushan Zheng

Published: 2024, Last Modified: 05 Mar 2025BIBM 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Traditional PCR/NGS-based multigene panel testing is time-consuming and costly. Predicting EGFR mutations directly from H&E stained whole slide images (WSIs) can alleviate these limitations. Furthermore, histopathological reports contain valuable textual information that correlates with tissue areas in WSIs. However, recent research mainly analyses EGFR mutation status only from a single modality, ignoring rich information contained in reports. In this paper, we propose a report-guided cross-modal representation learning method for predicting EGFR mutations by WSIs. Specifically, we reconstruct report-level embeddings through exploring intrinsic relationships between diagnostic words in histopathological reports and tissue areas in WSIs. Finally, reconstructed histopathological report embedding and aggregated WSI embedding are fused for final prediction. More importantly, molecular testing report is also introduced as prior supervision information at the training stage to guarantee semantic consistency of fused feature and molecular report embedding. We evaluate our method on the TCGA-EGFR public benchmark dataset and an in-house clinical dataset (USTC-EGFR). Experimental results demonstrate that our method outperforms existing approaches in EGFR mutation prediction, highlighting the benefits of cross-modal learning in enhancing feature representational ability. The code is available at https://github.com/HFUT-miaLab/RCRL.