Generalizable Whole Slide Image Classification with Fine-Grained Visual-Semantic Interaction

Published: 01 Jan 2024, Last Modified: 14 Nov 2024CVPR 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Whole Slide Image (WSI) classification is often formu-lated as a Multiple Instance Learning (MIL) problem. Re-cently, Vision-Language Models (VLMs) have demonstrated remarkable performance in WSI classification. However, existing methods leverage coarse-grained pathogenetic de-scriptions for visual representation supervision, which are insufficient to capture the complex visual appearance of pathogenetic images, hindering the generalizability of mod-els on diverse downstream tasks. Additionally, processing high-resolution WSIs can be computationally expensive. In this paper, we propose a novel “Fine-grained Visual-Semantic Interaction” (FiVE) framework for WSI classi-fication. It is designed to enhance the model's general-izability by leveraging the interaction between localized visual patterns and fine-grained pathological semantics. Specifically, with meticulously designed queries, we start by utilizing a large language model to extract fine-grained pathological descriptions from various non-standardized raw reports. The output descriptions are then reconstructed into fine-grained labels used for training. By introducing a Task-specific Fine-grained Semantics (TFS) module, we enable prompts to capture crucial visual information in WSIs, which enhances representation learning and aug-ments generalization capabilities significantly. Further-more, given that pathological visual patterns are redun-dantly distributed across tissue slices, we sample a subset of visual instances during training. Our method demon-strates robust generalizability and strong transferability, dominantly outperforming the counterparts on the TCGA Lung Cancer dataset with at least 9.19% higher accu-racy in few-shot experiments. The code is available at: https://github.com/lslrius/WSI_FiVE.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview