Keywords: few-shot pathology, vision-language models
TL;DR: a multi-granular prompt learning method to advance few-shot pathology classification
Abstract: In this study, we propose a novel architecture for a large vision-language model
adapted with a multi-granular prompt learning method to advance few-shot pathol-
ogy classification. Starting with the Prov-GigaPath foundation model - pre-trained
on 1.3 billion pathology image patches - we extend it into a vision-language model
by adding adaptors and aligning it with medical text encoders via contrastive
learning on 923K image-text pairs. In contrast to previous approaches that combine
prompts with frozen features using prefix embeddings or self-attention, our multi-
granular attention mechanism evaluates interactions between learnable prompts,
individual image patches, and patch groups, capturing both fine details and broader
context. We further improve the precision with an unbalanced optimal transport-
based visual-text distance that mitigates perturbations from data augmentation.
Experiments on lung and kidney pathology imaging modalities show that our
method outperforms state-of-the-art competitors and improves performance across
various architectures, including CLIP, PLIP, and the Prov-GigaPath integrated PLIP.
Submission Number: 89
Loading