Novel Finetuning Strategies for Adapting Biomedical Vision Language Models to Organ-Centered Pathology Microscopy Tasks

Published: 09 Oct 2025, Last Modified: 09 Oct 2025NeurIPS 2025 Workshop ImageomicsEveryoneRevisionsBibTeXCC BY 4.0
Submission Track: Short papers presenting ongoing research or work submitted to other venues (up to 5 pages, excluding references)
Keywords: finetuning, Low-Rank Adaptation (LoRA), pathology microscopy, weight interpolation, biomedical vision-language model
TL;DR: A novel fine tuning method that combines PEFT methods such as LoRA with various weight interpolation techniques to increase generalization capabilities of biomedical vision language models when evaluating pathology microscopy images.
Abstract: Biomedical vision-language models (VLMs) struggle with performance deterioration on earlier domains after fine-tuning and limited generalization under domain diversity and dataset imbalance. We propose an adapter-level framework combining Low-Rank Adaptation (LoRA) for efficient domain-specific tuning with model souping for cross-domain adaptability in microscopy images. Using BioMedCLIP and organ-specific domains from $\mu$-Bench, adapter soups mitigate low generalization and improve robustness, achieving gains of up to 15\% on fine-grained and 38\% on coarse-grained tasks over baseline BioMedCLIP. The process is data- and resource-efficient, and hyperparameter analysis reveals sensitivities to domain similarity and dataset imbalance. Adapter merging offers a lightweight scalable approach for organ-specific accuracy and cross-domain stability in biomedical VLMs.
Submission Number: 64
Loading