MINT: Multimodal Integrated Knowledge Transfer to Large Language Models through Preference Optimization with Biomedical Applications

Published: 11 Jun 2025, Last Modified: 18 Jul 2025GenBio 2025 SpotlightEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multimodal Models, Large Language Models, Rare Disease Prediction, Tissue Type Classification, Preference Optimization
TL;DR: A Framework transfers knowledge from multimodal biomedical models to unimodal LLMs through preference optimization, enhancing rare disease prediction and tissue classification without requiring multimodal inputs during inference.
Abstract: The scarcity of high-quality multimodal biomedical data limits the effective fine-tuning of Large Language Models (LLMs) for specialized tasks. We introduce MINT (Multimodal Integrated kNowledge Transfer), a framework that aligns unimodal decoder-only models with domain-specific patterns from multimodal biomedical data through preference optimization, primarily implemented using the Odds Ratio Preference Optimization (ORPO) framework. MINT leverages upstream multimodal machine learning models to transfer domain expertise to downstream text-only or image-only LLMs, as demonstrated in two applications: (1) Rare genetic disease prediction from texts; (2) Tissue type classification using cell nucleus images. In both cases, MINT-based models outperform those enhanced with alternative approaches such as Supervised Fine-tuning and Retrieval-augmented Generation, even surpassing much larger foundation models in some scenarios. Our study highlights how MINT effectively grafts the classification strengths of encoder-only models into large decoder only models, enhancing reasoning abilities, and reducing hallucination in biomedical applications.
Submission Number: 8
Loading