RadEyeVideo: Enhancing general-domain Large Vision Language Model for chest X-ray analysis with video representations of eye gaze

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: LVLM, Eye Gaze, Video, Medicine, Medical Image, Chest X-ray, Chest X-ray Report Generation
TL;DR: radiologists’ eye-fixation data as a video sequence, capturing both the temporal and spatial dynamics of their gaze for chest x-ray report generation
Abstract: Large Vision-Language Models (LVLMs) have demonstrated promising performance in chest X-ray (CXR) analysis. To enhance human-computer interaction, several studies have incorporated radiologists' eye gaze, typically through heatmaps or textual prompts. However, these methods often overlook the sequential order of eye movements, which could provide valuable insights by highlighting both the areas of interest and the order in which they are examined. In this work, we propose a novel approach called RadEyeVideo that integrates radiologists’ eye-fixation data as a video sequence, capturing both the temporal and spatial dynamics of their gaze. The video, featuring a red gaze point overlaid on CXR images, emphasizes regions of focused attention during interpretation. We evaluate this method in CXR report generation and disease diagnosis using three general-domain, open-source LVLMs with a video input capabilities. When prompted with eye-gaze videos, model performance improves by up to 25.4% on Impression generation task and on average 7.9% for all tasks using scaled evaluation metrics. Our approach enhanced open-domain LVLM models, when combined with exemplar reports for in-context learning, outperform medical models as well as those specifically trained for CXR report generation on the benchmark dataset. This work highlights that domain expert's knowledge (eye-gaze information in this case), when effectively integrated with LVLMs, can significantly enhance general-domain models' capabilities in clinical tasks, pointing out a new effective approach of utilising LVLMs in healthcare and beyond.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 12178
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview