Improving Vision-LLMs with Human Cognitive Signals

Published: 22 Sept 2025, Last Modified: 27 Oct 2025WiML @ NeurIPS 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Vision–Language Models (VLMs), Eye-Tracking (ET), Human Alignment, Direct Preference Optimization (DPO), Saliency Prediction, Hallucination Mitigation
Abstract: We propose to collect eye-tracking during preference annotation to build a Vision–Large Language Model (VLLM) alignment dataset and to train a saliency predictor of human attention. We will compare real versus synthetic gaze supervision and evaluate attention editing together with Direct Preference Optimization (DPO) to reduce hallucinations and improve human alignment.
Submission Number: 219
Loading