\documentclass[../midl25_191.tex]{subfiles}
\begin{document}
\label{sec:conclusion}
Our multimodal deep learning approach successfully predicted post-treatment VA in DME patients with mean absolute errors of $3.07 \pm 0.82$ ETDRS letters (treatment-naïve) and $4.20 \pm 
 2.79$ ETDRS letters (chronically treated). These errors, below the clinically significant threshold of 5 ETDRS letters, demonstrate potential for informing treatment decisions. Ablation experiments validated our feature selection approach and cross-modal attention mechanisms, while Grad-CAM analysis revealed pathology-specific attention patterns between patient cohorts.
While these findings highlight the promise of multimodal deep learning for small datasets, further validation with multi-centre studies and external datasets is needed. Future research should optimize architectures for small datasets, improve generalizability, and address systematic biases, enabling valuable clinical insights and personalized treatment planning for DME patients.
\end{document}