Look \& Mark: Leveraging Radiologist Eye Fixations and Bounding boxes in Multimodal Large Language Models for Chest X-ray Report Generation
Abstract: Recent advancements in multimodal Large Language Models (LLMs) have significantly enhanced the automation of medical image analysis, particularly in generating radiology reports from chest X-rays (CXR). However, these models still suffer from hallucinations and clinically significant errors, limiting their reliability in real-world applications. In this study, we propose Look \& Mark (L\&M), a novel grounding fixation strategy that integrates radiologist eye fixations (Look) and bounding box annotations (Mark) into the LLM prompting framework. Unlike conventional fine-tuning, L\&M leverages in-context learning to achieve substantial performance gains without retraining.
When evaluated across multiple domain-specific and general-purpose models, L\&M demonstrates significant gains, including a 1.2\% improvement in overall metrics (A.AVG) for CXR-LLaVA compared to baseline prompting and a remarkable 9.2\% boost for LLaVA-Med. General-purpose models also benefit from L\&M combined with in-context learning, with LLaVA-OV achieving an 87.3\% clinical average performance (C.AVG)—the highest among all models, even surpassing those explicitly trained for CXR report generation. Expert evaluations further confirm that L\&M reduces clinically significant errors (by 0.43 average errors per report), such as false predictions and omissions, enhancing both accuracy and reliability. These findings highlight L\&M's potential as a scalable and efficient solution for AI-assisted radiology, paving the way for improved diagnostic workflows in low-resource clinical settings.
Paper Type: Long
Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond
Research Area Keywords: Multimodality and Language Grounding to Vision, Efficient/Low-Resource Methods for NLP, Generation, NLP Applications, Human-Centered NLP
Contribution Types: Approaches low compute settings-efficiency
Languages Studied: English
Submission Number: 4265
Loading