Abstract: Medical image computing (MIC) is devoted to computational methods for analysis of medical imaging data and their assessment through experiments. It is thus an experimental science. Reproducibility is a cornerstone of progress in all experimental sciences. As in many other fields, there are major concerns that reproducibility is unsatisfactory in MIC. However, reproducibility is not a single concept but a spectrum, which is often misunderstood by researchers. Moreover, even though some measures have been put in place to promote reproducibility in the MIC community, it is unclear if they have been effective so far.
The objectives of the present paper are three-fold: i) to provide readers with the necessary concepts underlying reproducibility in MIC; ii) to describe the measures which have been put in place and assess some of them; iii) to sketch some possible new actions that could be taken.
First, we present a conceptual framework which distinguishes between different types of reproducibility as well as the main building blocks of reproducible research. We then describe how reproducibility is currently assessed at the MICCAI (Medical Image Computing and Computer Assisted Interventions) conference. In particular, we perform a quantitative analysis of MICCAI reviews. It reveals that, on the matter of reproducibility, reviews are unreliable and uninformative. Furthermore, we unveil some bad practices of some of the authors. Finally, we summarize the current state of affairs and suggest some potential actions that could be discussed within the community to progress towards more reproducible research. We insist that reproducibility is a spectrum, that there will never be a "one-size-fits-all" model but that there is plenty of room for improvement across all types of reproducibility.
The code and data to reproduce the results of this paper are available at: https://github.com/reproducibility-reviews/reproducibility-reviews.
Loading