Keywords: Vision–Language Models (VLMs), Eye-Tracking (ET), Human Alignment, Direct Preference Optimization (DPO), Saliency Prediction, Hallucination Mitigation
Abstract: We propose to collect eye-tracking during preference annotation to build a Vision–Large Language Model (VLLM) alignment dataset and to train a saliency predictor of human attention. We will compare real versus synthetic gaze supervision and evaluate attention editing together with Direct Preference Optimization (DPO) to reduce hallucinations and improve human alignment.
Submission Number: 219
Loading