Causal Counterfactual Learning in Vision-Language for WSI Classification

Dawei Fan; Mingyue Han; Jiaming Yu; Xuemei Qiu; Yanping Chen; RIQING CHEN; Lifang Wei

Causal Counterfactual Learning in Vision-Language for WSI Classification

Dawei Fan, Mingyue Han, Jiaming Yu, Xuemei Qiu, Yanping Chen, RIQING CHEN, Lifang Wei

10 Sept 2025 (modified: 01 Dec 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Causal Learning, Whole Slide Image Classification, Vision-Language, Large Language Model

Abstract: Traditional Multiple Instance Learning (MIL), as a core method for Whole Slide Image (WSI) classification in computational pathology, often leads to model misclassification due to insufficient information when relying solely on visual representations. The introduction of Large Language Models (LLMs) has provided rich textual prompts to enhance visual representations. However, the data-driven learning of LLMs often induces spurious correlations between visual signals and text, causing inaccurate textual descriptions that pollute the alignment process and degrade WSI classification performance. To address this issue, we propose a Causal-learning Dual-attention MIL framework (CDMIL). The framework first achieves preliminary alignment through a prototype-guided dual-attention mechanism, followed by a counterfactual learning strategy for causal intervention. Replacing factual text with counterfactual text forces the model to abandon its reliance on spurious correlations and instead learn genuine causal relationships. Experiments demonstrate that CDMIL achieves state-of-the-art performance in both accuracy and out-of-distribution robustness, validating the superiority of this causal learning framework. The code will be released at https://github.com/xxx/CDMIL.

Primary Area: causal reasoning

Submission Number: 3577

Loading