MFMF: Multiple Foundation Model Fusion Networks for Whole Slide Image Classification

Thao M. Dang, Yuzhi Guo, Hehuan Ma, Qifeng Zhou, Saiyang Na, Jean Gao, Junzhou Huang

Published: 01 Jan 2024, Last Modified: 04 Mar 2025BCB 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Tumor detection and subtyping remain a significant challenge in histopathology image analysis. As digital pathology progresses, the applications of deep learning become essential. Whole Slide Image (WSI) classification has emerged as a crucial task in digital pathology, vital for accurate cancer diagnosis and treatment. In this paper, we introduce an innovative abnormal-guided Multiple Foundation Model Fusion (MFMF) framework, aimed at enhancing WSI classification by integrating multi-level information from pathology images with Multiple Instance Learning (MIL). Traditional methods often focus on patch-level features while neglecting the rich contextual and morphological details at the cell and text levels, thus failing to fully exploit the multidimensional nature of WSIs. Our method enhances traditional models by efficiently integrating patch-level, cell-level, and text-level features using three foundation models. These are then fused through a novel three-step cross-attention module that effectively leverages cell and text information with patch-level features. Furthermore, unlike most studies that use attention scores to select instances based on the assumption that high scores indicate the presence of a tumor, we design an abnormality-aware module to naturally identify and detect abnormal features (i.e., tumors) as the criteria for selecting important instances, thereby reducing computational costs and boosting overall performance. We validate our approach against leading benchmarks on the CAMELYON16 and TCGA-Lung datasets, achieving superior classification performance. Our study not only tackles the challenges of sparsity and noise in multi-level features but also enhances the efficiency and accuracy of WSI classification by exploiting abnormal features.