Abstract: Context: In scientific software, the inability to reproduce results is often due to technical issues and challenges in recreating the full computational workflow from the original analysis. We conceptualise this problem as Reproducibility Debt (RpD). Much research has been performed to propose solutions to tackle these issues across various computational science disciplines. It is essential to identify and accumulate existing knowledge on reproducibility issues and state-of-the-art solutions so as to provide researchers and practitioners with information that enables further research activities and RpD management in practice. Objective: In the context of scientific software, we aim to characterise RpD by providing a taxonomy of issues contributing towards its emergence and identification (causes, effects) and the common solutions discussed in the existing literature. Method: We conducted a systematic literature review, considering 2198 studies until January 2024, including 214 primary studies. Results: We propose the first taxonomy of RpD items consisting of 37 causes attributed towards its emergence, 63 corresponding effects under seven main categories, and 29 prevention strategies. We also identify 39 specialised tools/frameworks supporting reproducibility. Conclusion: The main contributions of this work are (1) a formal definition of RpD; (2) a taxonomy of issues contributing towards RpD; (3) a list of causes and effects having implications for software professionals to identify and measure RpD in their projects; (4) a list of strategies and tools to prevent or remove RpD; (5) the identification of gaps in existing research to guide future studies.
External IDs:doi:10.1016/j.jss.2024.112327
Loading