Abstract: Cognitive impairment is a growing public health concern, with early detection playing a crucial role in improving patient outcomes. The Montreal Cognitive Assessment (MoCA) is widely used for screening mild cognitive impairment (MCI) and early-stage dementia. However, traditional MoCA assessments require manual scoring by trained professionals, making the process labor-intensive, time-consuming, and susceptible to human error. To overcome these limitations, we propose an automated pipeline for MoCA score estimation using eye-gaze data and Vision Transformers (ViTs). Our approach leverages gaze-tracking technology to capture spatial and temporal eyemovement patterns during structured cognitive tasks, identifying subtle cognitive impairments that may otherwise go unnoticed. The raw gaze data is preprocessed and mapped onto taskrelevant image regions, where a pretrained ViT extracts highdimensional feature representations. To address inconsistencies in gaze sampling and improve temporal modeling, we introduce a time-aware positional embedding mechanism that enhances the model's ability to infer cognitive performance. These extracted features are then processed by a transformer-based classification model to predict MoCA scores with high accuracy. We validate our approach using a dataset collected from seven cognitive gaming sessions, demonstrating its effectiveness in automated cognitive assessment. The experimental results indicate that our method provides a reliable and efficient alternative to traditional MoCA evaluations, reducing dependency on human intervention while maintaining diagnostic accuracy.
Loading