A Living Overview of Eye-Tracking-while-Reading Datasets with Open Library Support

Published: 14 Jun 2026, Last Modified: 14 Jun 2026MultiplEYE 2026 PosterEveryoneRevisionsCC BY 4.0
Keywords: eye-tracking, datasets, review, interoperability, open source
TL;DR: We present a living eye-tracking-while-reading dataset review with open library support
Abstract: Eye-tracking-while-reading corpora are a valuable resource for a wide range of disciplines and applications. These applications span from investigating the cognitive processes involved in reading to machine-learning- based uses, such as gaze-driven assessments of reading comprehension (e.g., Rayner et al., 2006; Ahn et al., 2020; Reich et al., 2025). In recent decades, both the number and size of eye-tracking-while-reading datasets have grown, along with increasing diversity in terms of stimulus languages, participants’ linguistic backgrounds, and the inclusion of psychometric or demographic information. However, the distribution of data across differ- ent disciplines, combined with the absence of common data-sharing standards, has resulted in many existing datasets being difficult to reuse due to limited interoperability. To overcome the lack of transparency and clarity with regards to existing datasets and their features across different disciplines, we present a living dataset review with open library support which can be found at the following link: https://dili-lab.github.io/datasets.html. The purpose of this review is to present existing datasets including a wide range of different features. These features include the number of participants and items, description of the stimuli, information on available data formats, and many more. The living nature of the review allows for adding new datasets as they are created by their authors. In addition, already added datasets can be edited should there be incomplete or missing information by sending an edit request. In addition, all publicly available datasets have been integrated into the Python package pymovements (Krakowczyk et al., 2023) which offers an eye-tracking datasets library (Krakowczyk et al., 2025). The living overview as well as the pymovements integration simplifies sharing of new datasets, makes existing datasets more visible and their features transparent, and therefore strengthens the FAIR (Findable, Accessible, Interoperable, Reusable) principles (Wilkinson et al., 2016) and encourages reproducing and replicating studies as good scientific practices.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Submission Type: Poster presentation
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 17
Loading