Few-Shot Whole Slide Pathology Classification with Multi-Granular Vision-Language Models

Anh-Tien Nguyen; Duy Minh Ho Nguyen; Nghiem Tuong Diep; Trung Quoc Nguyen; Nhat Ho; Jacqueline Michelle Metsch; Miriam Cindy Maurer; Daniel Sonntag; Hanibal Bohnenberger; Anne-Christin Hauschild

Few-Shot Whole Slide Pathology Classification with Multi-Granular Vision-Language Models

Anh-Tien Nguyen, Duy Minh Ho Nguyen, Nghiem Tuong Diep, Trung Quoc Nguyen, Nhat Ho, Jacqueline Michelle Metsch, Miriam Cindy Maurer, Daniel Sonntag, Hanibal Bohnenberger, Anne-Christin Hauschild

Published: 06 Mar 2025, Last Modified: 01 Apr 2025ICLR 2025 FM-Wild WorkshopEveryoneRevisionsBibTeXCC BY 4.0

Keywords: few-shot pathology, vision-language models

TL;DR: a multi-granular prompt learning method to advance few-shot pathology classification

Abstract: In this study, we propose a novel architecture for a large vision-language model adapted with a multi-granular prompt learning method to advance few-shot pathol- ogy classification. Starting with the Prov-GigaPath foundation model - pre-trained on 1.3 billion pathology image patches - we extend it into a vision-language model by adding adaptors and aligning it with medical text encoders via contrastive learning on 923K image-text pairs. In contrast to previous approaches that combine prompts with frozen features using prefix embeddings or self-attention, our multi- granular attention mechanism evaluates interactions between learnable prompts, individual image patches, and patch groups, capturing both fine details and broader context. We further improve the precision with an unbalanced optimal transport- based visual-text distance that mitigates perturbations from data augmentation. Experiments on lung and kidney pathology imaging modalities show that our method outperforms state-of-the-art competitors and improves performance across various architectures, including CLIP, PLIP, and the Prov-GigaPath integrated PLIP.

Submission Number: 89

Loading