Heatmap-informed Direct Preference Optimization for Mitigating Hallucinations in Medical LVLMs on Subtle Lesions
Keywords: Medical Large Vision-Language Model, Direct Preference Optimization, Heatmap
Abstract: Medical large vision-language models (Med-LVLMs) have shown strong capabilities in clinical tasks such as medical VQA and report generation, but remain prone to hallucinations—textual output inconsistent with the corresponding images, which can lead to misdiagnoses or overlooked findings. Existing Direct Preference Optimization (DPO) methods, relying on coarse-grained vision language alignment and synthetic text-based preference data, often fail to capture subtle lesions, as hallucinations frequently arise from insufficient fine-grained alignment and preference data that do not faithfully reflect visual content. To address these challenges, we propose Heatmap-informed Direct Preference Optimization (HDPO), which integrates lesion-level heatmaps to mitigate hallucinations of Med-LVLMs on subtle lesions. HDPO leverages heatmaps to guide preference data curation by explicitly modeling misdiagnosis, false positives, and false negatives, and employs a lesion-weighted DPO loss to emphasize clinically salient regions, allowing fine-grained visual-textual alignment and improved analysis of subtle lesions. Extensive experiments on four radiology datasets demonstrate that HDPO consistently outperforms the latest baselines, achieving up to 3\% improvement in VQA accuracy and 2\% gains in report generation metrics, particularly for subtle lesions, confirming its effectiveness in reducing hallucinations and enhancing factual accuracy in Med-LVLMs.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 15845
Loading