LLaMA32-Med: Parameter-Efficient Adaptation of Multimodal LLMs for Medical Visual Question Answering
Keywords: Medical Artificial Intelligence, Multimodal Large Language Models (MLLMs), Parameter-Efficient Fine-Tuning (PEFT), Medical VQA, Clinical Applications
Abstract: Artificial intelligence has shown great promise in healthcare, particularly in diagnostic support. However, most existing models rely on unimodal inputs and struggle to leverage multimodal clinical data. Although recent Multimodal Large Language Models (MLLMs) exhibit strong potential, their performance in medical scenarios is constrained by training on general-domain data and the high computational cost of full-parameter adaptation.
In this work, we present a two-stage lightweight adaptation framework for fine-tuning general-purpose MLLMs on medical multimodal tasks. Building on the LLaMA 3.2 Vision-Instruct model, we adopt parameter-efficient fine-tuning techniques that update less than 2\% of the model parameters, enabling domain-specific medical knowledge injection while requiring only roughly 20 GB GPU memory. We further design task-specific and role-based prompting strategies to better guide medical visual understanding tasks.
Experimental results show that our approach achieves performance comparable to or surpassing state-of-the-art methods while significantly outperforming the original general-domain model. Comparative evaluations with recent MLLMs highlight the strong adaptability of the LLaMA 3.2 Vision-Instruct backbone, validating its effectiveness as a foundation for multimodal medical AI systems in practical settings.
Primary Subject Area: Transfer Learning and Domain Adaptation
Secondary Subject Area: Application: Radiology
Registration Requirement: Yes
Visa & Travel: Yes
Read CFP & Author Instructions: Yes
Originality Policy: Yes
Single-blind & Not Under Review Elsewhere: Yes
LLM Policy: Yes
Submission Number: 389
Loading