Enriching Context for Pathological Diagnosis via Multimodal Auxiliary Information

ACL ARR 2025 May Submission3467 Authors

19 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Although computational pathology has substantially advanced the automated analysis of pathological images, its reliance on visual features overlooks the multimodal context that human pathologists integrate, thereby constraining diagnostic accuracy. This study explores enhancing pathological diagnosis by providing models with three types of auxiliary information during inference, including clinical history, terminology explanations, and visual in-context examples. We fine-tune a vision-language model for pathological diagnosis with large-scale pretraining and instruction following optimization. Experiments across slide-level diagnosis, region of interest subtyping, and invasion detection tasks demonstrate significant improvements with enriched context. Our findings highlight the potential of enriching context with auxiliary information to bridge the gap between human diagnosis and computational pathology.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: multimodal applications,healthcare applications
Contribution Types: Model analysis & interpretability
Languages Studied: English
Submission Number: 3467
Loading