Submission Track: Short papers presenting ongoing research or work submitted to other venues (up to 5 pages, excluding references)
Keywords: digital pathology, parameter efficient finetuning, vision language models, alignment
TL;DR: We benchmark PEFT for adapting vision–language models in pathology, showing near parity with full-data models, but gaps persist in few-shot tasks. Novel dataset reveals the need for better interpretability and multimodal reasoning.
Abstract: Generalist vision–language models (VLMs) struggle on histopathology tasks due to domain gaps and scarce labels. Pathology VLMs (PFMs) also fall short despite costly pretraining. Parameter-efficient fine-tuning (PEFT) offers a scalable lightweight approach to quickly adapt large pretrained models to target histopathology tasks. We present the first benchmark of PEFT methods when applied to VLMs/PFMs for histopathology tasks. We categorize existing PEFT methods based on adaptation modality, strategy and locus. We curate a novel neuropathology dataset for detecting neurofibrillary tangles (NFTs), a hallmark of Alzheimer's Disease, capturing annotator variability to evaluate reliability and alignment. Experiments across prostate cancer, colorectal cancer, and neuropathology tasks show that with full data, PEFT-adapted generalist VLMs rival adapted PFMs, but fall short in few shot settings due to label scarcity, terminology mismatch, and modality-specific biases. Visualization further reveals that models such as CONCH+MMRL focus on NFT within annotated boxes, improving interpretability in single-NFT cases, but their performance diminishes in complex multi-NFT scenarios. Together, our benchmark and dataset highlight PEFT as a scalable strategy, but also indicate the need for richer interpretability metrics and improved multimodal reasoning to handle complex cases.
Submission Number: 63
Loading