Keywords: Reproducibility, Computational Reproducibility, Automation, MIDL
TL;DR: Presenting an automated framework for evaluating code reproducibility
Abstract: Reproducibility remains a critical challenge in deep learning for medical imaging, limiting the reliability and clinical adoption of published research. An automated framework is presented to assess key reproducibility factors --- dependencies, training/evaluation code, weights, documentation and licensing --- by analyzing GitHub repositories. Validated on manually annotated MIDL 2024 submissions, the system achieves 66.8%-96.9% accuracy across criteria. Applied to 3,682 papers from MIDL, MICCAI, Nature, and arXiv reveals widespread gaps, particularly in sharing model weights and documentation. This approach enables scalable, objective reproducibility assessments and lays the groundwork for integration into peer review workflows. The source code and a live demo is available online (https://huggingface.co/spaces/attilasimko/reproduce)
Submission Number: 12
Loading