Keywords: Whole Slide Image, Vision–Language Learning, Pathology Reports, Weakly Supervised Learning, Cross-Modal Alignment, Contrastive Learning
Abstract: Vision–language learning has become a powerful framework for multimodal representation, achieving exceptional performance across diverse image–text tasks. However, in histopathology, existing methods often rely on high-resolution region-level annotations to achieve fine-grained visual–textual alignment—an assumption that is impractical for Whole Slide Image (WSI) classification due to the gigapixel scale of pathology images and the weak supervision provided by slide-level labels. To address this challenge, we propose Report-Conditioned Attentive Patching for Weakly Supervised WSI Classification (ReCAP), a novel approach that leverages slide-level pathology reports to enrich patch-level feature learning without requiring localized supervision. Instead of relying on explicit region annotations, ReCAP adopts a hybrid multimodal contrastive MIL framework in which report-conditioned text embeddings guide cross-attention to highlight semantically discriminative tissue regions. We further introduce a self-normalizing cross-modal attended similarity function that enhances the robustness and stability of patch–text alignment under weak supervision. In addition, our approach incorporates an efficient report-aware patch aggregation strategy that suppresses redundant or noisy regions while retaining the most diagnostically informative patterns within the vision–language context. Across multiple cancer subtype classification and survival prediction tasks, ReCAP consistently improves performance by 2–5\%, demonstrating the effectiveness of report-conditioned cross-modal alignment for scalable and annotation-efficient WSI understanding.
Primary Subject Area: Application: Histopathology
Secondary Subject Area: Detection and Diagnosis
Registration Requirement: Yes
Visa & Travel: Yes
Read CFP & Author Instructions: Yes
Originality Policy: Yes
Single-blind & Not Under Review Elsewhere: Yes
LLM Policy: Yes
Submission Number: 362
Loading