Eye-Tracking Driven Dyslexia Detection: A Data-Efficient Approach Using Synthetic Augmentation and XGBoost Classifier
Abstract: Abstract—Early and accurate detection of Developmental
Learning Disorders (DLD), such as dyslexia, remains a key
challenge due to limited availability of publicly accessible data
and diagnostic complexity. In this study, we propose a dyslexia
detection model that promises efficient usage of data, using eye-
tracking features extracted from reading behaviour. To overcome
the challenge of sample scarcity, we augmented the publicly
available Zenodo dataset (70 subjects) with 140 synthetically
generated samples, forming a balanced dataset of 210 partici-
pants (105 dyslexic, 105 non-dyslexic). We extracted nine detailed
spatiotemporal and behavioral features, such as fixation duration,
saccade length, dispersion metrics, number of lines fixated, and
trained an XGBoost classifier. The model was evaluated through
5-fold cross-validation, achieving a mean accuracy of 85.1%
(±2.7). The final model achieved a training AUC of 0.9928 with
an accuracy of 0.9464 and a test AUC of 0.9819, with an accuracy
of 0.9268. No over-fitting was observed in the model (Train-
Test gap: 0.0178). Our results show that, by combining synthetic
augmentation with meticulously engineered features, advanced
ensemble methods such as XGBoost can successfully support
early screening of dyslexia through eye-movement analysis.
Index Terms—Developmental learning disorders (DLD),
Dyslexia detection, Eye tracking biomarkers, Neuro-cognitive
engineering, Biomedical data engineering, Computational Neu-
roscience.
Loading