Predicting Ophthalmologist Gaze Patterns on OCT Data with Masked Autoencoders and Long Short-Term Memory Networks

Tri Le; Kuang Sun; Kaveri A. Thakoor

Predicting Ophthalmologist Gaze Patterns on OCT Data with Masked Autoencoders and Long Short-Term Memory Networks

Tri Le, Kuang Sun, Kaveri A. Thakoor

Published: 25 Sept 2024, Last Modified: 23 Oct 2024IEEE BHI'24EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Attention, Deep Learning, Gaze Prediction, LSTM, Ophthalmology, Optical Coherence Tomography, Masked Autoencoder, Vision Transformer

TL;DR: A hybrid LSTM + MAE approach to predict gaze patterns of experienced ophthalmologists to guide novice residents and corroborate expert decisions

Abstract: Scanpath prediction is crucial in the medical domain as it captures the visual attention patterns of experienced clinicians, offering insights into diagnostic processes and enhancing training programs. Understanding where experts focus can lead to improved medical imaging interpretation and decision-making. However, scanpath prediction is extremely challenging due to the inherent noise in eye-tracking data, individual variability among clinicians, and the complexity of medical images. This work introduces a pioneering adaptation of the “Show, Attend and Tell” (SAT) framework to analyze the gaze patterns of ophthalmologists on Optical Coherence Tomography (OCT) reports. Instead of using Convolutional Neural Networks (CNNs) for visual feature extraction, we integrate self-supervised learning through a Masked Autoencoder (MAE). The MAE reconstructs masked regions of OCT images, enabling the encoder to generate robust image representations despite limited labeling in medical imaging datasets. We trained separate LSTM models for each clinician to account for individual inspection patterns. The model demonstrated strong evaluation results, with the best-performing model achieving a ScanMatch score up to 0.5595 and Pearson correlation of up to 0.866 in predicting expert gaze on OCT reports. We showcase a downstream use-case of predicting the sequence of expert-fixated regions on an OCT report and visualizing these for ophthalmic resident education. Our findings highlight the framework’s potential to enhance the understanding and emulation of expert-level diagnostic mechanisms, aiding in the explanation of AI-based predictions in the clinic and guiding novice residents in ophthalmic education, especially in resource-diverse environments with limited access to expert ophthalmologists or labeled datasets.

Track: 4. AI-based clinical decision support systems

Registration Id: JVNNK9NQ6FJ

Submission Number: 360

Loading