Eye-Tracking Driven Dyslexia Detection: A Data-Efficient Approach Using Synthetic Augmentation and XGBoost Classifier

Published: 10 Dec 2025, Last Modified: 22 Feb 2026IEEE International Conference on Bioinformatics and Bioengineering (BIBE) 2025EveryoneCC BY-NC 4.0
Abstract: Abstract—Early and accurate detection of Developmental Learning Disorders (DLD), such as dyslexia, remains a key challenge due to limited availability of publicly accessible data and diagnostic complexity. In this study, we propose a dyslexia detection model that promises efficient usage of data, using eye- tracking features extracted from reading behaviour. To overcome the challenge of sample scarcity, we augmented the publicly available Zenodo dataset (70 subjects) with 140 synthetically generated samples, forming a balanced dataset of 210 partici- pants (105 dyslexic, 105 non-dyslexic). We extracted nine detailed spatiotemporal and behavioral features, such as fixation duration, saccade length, dispersion metrics, number of lines fixated, and trained an XGBoost classifier. The model was evaluated through 5-fold cross-validation, achieving a mean accuracy of 85.1% (±2.7). The final model achieved a training AUC of 0.9928 with an accuracy of 0.9464 and a test AUC of 0.9819, with an accuracy of 0.9268. No over-fitting was observed in the model (Train- Test gap: 0.0178). Our results show that, by combining synthetic augmentation with meticulously engineered features, advanced ensemble methods such as XGBoost can successfully support early screening of dyslexia through eye-movement analysis. Index Terms—Developmental learning disorders (DLD), Dyslexia detection, Eye tracking biomarkers, Neuro-cognitive engineering, Biomedical data engineering, Computational Neu- roscience.
Loading