Human-in-the-Loop Chest X-Ray Diagnosis: Enhancing Large Multimodal Models with Eye Fixation Inputs

Published: 01 Jan 2024, Last Modified: 19 Feb 2025TAI4H 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In the realm of artificial intelligence-assisted diagnostics, recent advances in foundational models have shown great promise, particularly in medical image computing. However, the current scope of human-computer interaction with these models is often limited to inputting images and text prompts. In this study, we propose a novel human-in-the-loop approach for chest X-ray diagnosis with a large language and vision assistant using eye fixation prompts. The eye fixation prompts contain the location and duration of a radiologist’s attention during chest X-ray analysis. This assistant interacts with a radiologist in two ways: diagnosis recommendations of possible diseases and diagnosis report confirmation. The results show the enhanced human-computer interaction with the eye fixation prompt significantly improves the accuracy of the large multimodal model’s performance in differential diagnosis and report confirmation. Fine-tuning with just 658 reports with fixation information further boosted the performance of the LLaVA-1.5, surpassing the previous state-of-the-art model LLaVA-ERR, which was trained on 17k MIMIC reports, by 5%. Our study highlights that this novel approach can better assist radiologists in clinical decision-making in a reciprocal interaction where the models also benefit from the domain expertise of radiologists.
Loading