More Rigorous Software Engineering Would Improve Reproducibility in Machine Learning Research

TMLR Paper5707 Authors

22 Aug 2025 (modified: 04 Sept 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: While experimental reproduction remains a pillar of the scientific method, we observe that the software best practices supporting the reproduction of Machine Learning (ML) research are often undervalued or overlooked, leading both to poor reproducibility and damage to trust in the ML community. We quantify these concerns by surveying the usage of software best practices in software repositories associated with publications at major ML conferences and journals such as NeurIPS, ICML, ICLR, TMLR, and MLOSS within the last decade. We report the results of this survey that identify areas where software best practices are lacking and areas with potential for growth in the ML community. Finally, we discuss the implications and present concrete recommendations on how we, as a community, can improve reproducibility in ML research.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: N/A
Assigned Action Editor: ~Jes_Frellsen1
Submission Number: 5707
Loading