FeLoRA16-SPP: Parameter-Efficient Fine-Tuning of 3D Multimodal LLMs for Radiology Report Generation and Visual Question Answering

31 Aug 2025 (modified: 01 Sept 2025)MICCAI 2025 Challenge FLARE SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: 3D CT scans, GREEN, PEFT, Radiology report generation, VQA
TL;DR: FeLoRA16-SPP, a lightweight 3D MLLM with LoRA and spatial pooling, improves organ-level radiology report generation efficiently.
Abstract: Multimodal Large Language Models (MLLMs) are emerging as powerful tools for automating radiology report generation (RRG) and visual question answering (VQA) from 3D CT scans. In this work, we present FeLoRA16-SPP, a lightweight adaptation of the M3D-LaMed baseline that combines LoRA-based parameter-efficient fine-tuning with a spatial pooling projector, while freezing the vision encoder for efficiency. We evaluate FeLoRA16-SPP on the FLARE25 MICCAI challenge using the GREEN score, the official metric for organ-level report completeness. Our method improves performance on the GREEN score by up to 12% compared to PHI3 and by 4% compared to Med3DVLM, achieving an average score of 0.431 across 18 organ systems. FeLoRA16-SPP delivers top results for 9 organs including the respiratory tract (0.7901), kidneys (0.3386), biliary system (0.6344), pancreas (0.6173), and lymphatic system (0.6600). These results demonstrate that parameter-efficient adaptations of 3D MLLMs can provide clinically meaningful improvements in structured radiology report generation without requiring full-scale re-training.
Submission Number: 14
Loading