Enhancing Post-Treatment Visual Acuity Prediction with Multimodal Deep Learning on Small-scale Clinical and OCT Datasets

Published: 27 Mar 2025, Last Modified: 01 May 2025MIDL 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Deep Learning, Multimodal Learning, Visual Acuity, Optical Coherence Tomography (OCT), Diabetic Macular Edema
TL;DR: In this paper we demonstrate a novel multimodal deep learning approach that predicts post-treatment visual outcomes in diabetic macular edema by combining OCT imaging with clinical data from small-scale datasets.
Abstract: Predicting visual acuity (VA) outcomes after treatment in diabetic macular edema (DME) is crucial for optimizing patient management but remains challenging due to the heterogeneity of patient responses and the limited availability of comprehensive datasets. While existing predictive models have shown promise, their clinical deployment is hindered by their reliance on large training datasets that are often unavailable in real-world settings. We address this challenge by developing a multimodal deep learning framework specifically designed for small-scale clinical cohorts. Our approach integrates optical coherence tomography (OCT) images with carefully selected clinical parameters through a cross-modal fusion architecture that leverages attention mechanisms to enhance feature interaction and predictive accuracy. We validate our framework across two clinically distinct real-world cohorts: treatment-naïve patients ($n=35$) receiving intensive anti-VEGF therapy and chronically treated patients ($n=20$) receiving sustained-release corticosteroid implants. This approach achieves mean absolute errors in post-treatment VA prediction of $3.07 \pm 0.82$ and $4.20 \pm 2.79$ Early Treatment Diabetic Retinopathy Study (ETDRS) letters, respectively, falling within the acceptable range of clinical measurement variability and meeting thresholds for statistically significant visual change detection with $\geq90\%$ confidence. This work demonstrates that appropriately designed multimodal architectures can achieve clinically meaningful prediction accuracy even with limited datasets, offering a practical foundation for personalized DME management in typical clinical settings where large datasets are unavailable.
Primary Subject Area: Integration of Imaging and Clinical Data
Secondary Subject Area: Application: Ophthalmology
Paper Type: Both
Registration Requirement: Yes
Reproducibility: https://github.com/muanderson/VA_MM_DL
Latex Code: zip
Copyright Form: pdf
Submission Number: 191
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview