Abstract: Eye-tracking metrics offer valuable insights into human visual attention during language comprehension, yet existing corpora lack diverse machine-generated text samples. To bridge this gap, we introduce Gaze Responses for Evaluating AI Texts (GREAT), a comprehensive dataset and software framework capturing human eye-movement patterns during screen reading of passages generated by large language models (LLMs). The dataset includes raw eye-movement recordings, reading-time measures, and post-reading evaluations for LLM-generated passage pairs from MT-Bench, alongside rigorous validation metrics. The collected eye-tracking metrics demonstrate strong explanatory power in predicting text quality. When integrated with negative log-likelihood (NLL), a commonly used metric for evaluating text quality, substantially enhance model performance across all standard statistical criteria. These findings demonstrate that eye-tracking data effectively complement probabilistic metrics, improving predictive accuracy for text quality assessment. The full dataset and some processing code are publicly available at https://anonymous.4open.science/r/eye-track.
Paper Type: Long
Research Area: Linguistic theories, Cognitive Modeling and Psycholinguistics
Research Area Keywords: Eye-tracking, Machine-generated text, Comprehension signals
Contribution Types: Data resources
Languages Studied: English
Submission Number: 3526
Loading