Keywords: eye-tracking, datasets, review, interoperability, open source
TL;DR: We present a living eye-tracking-while-reading dataset review with open library support
Abstract: Eye-tracking-while-reading corpora are a valuable resource for a wide range of disciplines and applications.
These applications span from investigating the cognitive processes involved in reading to machine-learning-
based uses, such as gaze-driven assessments of reading comprehension (e.g., Rayner et al., 2006; Ahn et al.,
2020; Reich et al., 2025). In recent decades, both the number and size of eye-tracking-while-reading datasets
have grown, along with increasing diversity in terms of stimulus languages, participants’ linguistic backgrounds,
and the inclusion of psychometric or demographic information. However, the distribution of data across differ-
ent disciplines, combined with the absence of common data-sharing standards, has resulted in many existing
datasets being difficult to reuse due to limited interoperability.
To overcome the lack of transparency and clarity with regards to existing datasets and their features across
different disciplines, we present a living dataset review with open library support which can be found at the
following link: https://dili-lab.github.io/datasets.html. The purpose of this review is to present existing
datasets including a wide range of different features. These features include the number of participants and
items, description of the stimuli, information on available data formats, and many more. The living nature of the
review allows for adding new datasets as they are created by their authors. In addition, already added datasets
can be edited should there be incomplete or missing information by sending an edit request. In addition, all
publicly available datasets have been integrated into the Python package pymovements (Krakowczyk et al.,
2023) which offers an eye-tracking datasets library (Krakowczyk et al., 2025).
The living overview as well as the pymovements integration simplifies sharing of new datasets, makes existing
datasets more visible and their features transparent, and therefore strengthens the FAIR (Findable, Accessible,
Interoperable, Reusable) principles (Wilkinson et al., 2016) and encourages reproducing and replicating studies
as good scientific practices.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Submission Type: Poster presentation
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 17
Loading