Cost-effective instruction learning for pathology vision and language analysis

Published: 01 Jan 2025, Last Modified: 12 Nov 2025Nat. Comput. Sci. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The advent of vision–language models fosters interactive conversations between artificial intelligence-enabled models and humans. However, applying these models in the clinic faces challenges related to large-scale training data as well as financial and computational resources. Here we propose CLOVER, a cost-effective instruction learning framework for conversational pathology. CLOVER trains a lightweight module and uses instruction tuning while freezing the parameters of the large language model. Instead of using costly GPT-4, we propose well-designed prompts on GPT-3.5 for building generation-based instructions, emphasizing the utility of pathological knowledge derived from the Internet source. We construct a high-quality set of template-based instructions in the context of digital pathology. Using two benchmark datasets, our findings reveal the strength of hybrid-form, pathological visual question–answer instructions. CLOVER outperforms baselines that possess 37 times more training parameters and exhibits few-shot capacity on an external clinical dataset. CLOVER could thus accelerate the adoption of rapid conversational applications in digital pathology. Training foundation models often requires a costly budget and excessive computational resources. In this study, a low-cost instruction learning framework is proposed that could enable the rapid adoption of visual-language pathology applications.
Loading