Abstract: Although computational pathology has substantially advanced the automated analysis of pathological images, its reliance on visual features overlooks the multimodal context that human pathologists integrate, thereby constraining diagnostic accuracy. This study explores enhancing pathological diagnosis by providing models with three types of auxiliary information during inference, including clinical history, terminology explanations, and visual in-context examples. We fine-tune a vision-language model for pathological diagnosis with large-scale pretraining and instruction following optimization. Experiments across slide-level diagnosis, region of interest subtyping, and invasion detection tasks demonstrate significant improvements with enriched context. Our findings highlight the potential of enriching context with auxiliary information to bridge the gap between human diagnosis and computational pathology.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: multimodal applications,healthcare applications
Contribution Types: Model analysis & interpretability
Languages Studied: English
Submission Number: 3467
Loading