\FloatBarrier
\section{Conclusion}
The increasing clinical prevalence of AI-based reconstruction models creates a critical need for quantitative assessments of their potential downstream impact.
We performed a scalable evaluation by using reconstruction and diagnostic AI models in tandem across multiple datasets, tasks, pathologies, and model types. We view our results as largely positive for the field -- downstream performance was much more robust to reconstruction noise than image-level metrics, and the biases introduced by reconstruction were generally modest. However, some trends of increased bias were observed, especially for patient sex. Altogether, supported by these findings, we argue for the importance of monitoring downstream performance and fairness when using AI-based reconstruction models, and for continued work to mitigate emerging biases.
