\documentclass[../midl25_191.tex]{subfiles}
\begin{document}
\label{sec:discussion}
Our multimodal framework achieves clinically meaningful error margins, maintaining average prediction errors below 5 ETDRS letters across both datasets - within the threshold for statistically detectable change ($\ge$ 90\% confidence) in eyes with VA better than 20/100~\cite{beck2007visual}. The enhanced performance in DIME (MAE: $3.07 \pm 0.82$) compared to INDEX (MAE: $4.20 \pm 2.79$) can be attributed to higher OCT resolution and larger sample size. Analysis across VA ranges revealed distinct prediction patterns: DIME exhibited regression toward the mean (over-prediction in low VA, under-prediction in high VA ranges), while INDEX demonstrated under-prediction bias in higher VA ranges, likely influenced by its smaller sample size ($n$=2 for high VA).
Our ablation studies confirm the critical importance of baseline VA as a clinical predictor, with its replacement significantly degrading performance in both datasets. Additionally, the cross-modal attention mechanism proved essential for accurate predictions, particularly for treatment-naïve patients (25\% error increase when removed). These findings validate our approach to feature selection and architectural design for small clinical datasets.
Feature attribution analysis validates OCT volumes as the primary predictive signal, complemented by clinical parameters, aligning with findings on the relevance of both anatomical and clinical factors in DME progression~\cite{antonetti2006diabetic}. Grad-CAM visualization demonstrated anatomically relevant attention patterns corresponding to established OCT biomarkers~\cite{zur2018oct,sun2014disorganization}, with differential activation between treatment-naïve and chronic cases.
Limitations include systematic biases requiring range-specific calibration, small dataset size necessitating k-fold cross-validation, and dataset heterogeneity in OCT protocols and treatment modalities complicating cross-cohort comparison. Prospective validation with expert-graded OCT characteristics would help establish clinical utility.
\end{document}