ML Reproducibility Systems: Status and Research AgendaDownload PDF

Anonymous

02 Nov 2021 (modified: 05 May 2023)Submitted to JSYS Nov 21Readers: Everyone
Keywords: ML, Reproducibility, MLOps, Model Lifecycle
TL;DR: A framework to evaluate the reproducibility capabilities of different ML systems and an evaluation of 12 systems using the framework.
Abstract: As companies are more and more leveraging the power of machine learning (ML), ML models, the data they were trained on, and the training pipelines themselves, are becoming increasingly important assets. Hence, a new set of tools is being developed, which aim to help manage the ML model lifecycle in a more structured way. As the model lifecycle involves a lot of experimental back-and-forth, reproducibility is an important aspect that such tools need to provide. However, as model lifecycle management is an emerging field, only few best practices on how to support reproducibility exist. This led to a large variety of tools being developed that all have the same goal of providing reproducibility but with major differences in how this goal is achieved. As a result, users are overwhelmed with having to navigate this vast tooling landscape and having to choose a tool that best fits their needs. This is a difficult task as the reproducibility capabilities of different tools can vary and users need to determine themselves, what is supported and what functionalities would fit their specific use case. In this paper, our goal is to add structure to the process of deciding on a specific tool in terms of its reproducibility capabilities. We identify the most significant artifacts of the ML model lifecycle and, based on these, propose a generic classification framework that allows to assess the reproducibility capabilities of a specific tool and make it comparable to other tools. To evaluate our framework, we conduct an analysis of 12 popular ML lifecycle management tools. We study each tool in detail and classify it according to our framework. We then compare the tools to each other to determine what degree of reproducibility is provided in general and where reproducibility support is still lacking. While overall we find that the majority of tools offers most features that are required for full reproducibility, there still exist gaps in terms of automating reproducibility and cross-tool reproducibility. Based on these findings, we provide a set of research challenges which need to be addressed in order to better understand reproducibility and to fill the gaps of current systems.
Area: Data Science and Reproducibility
Type: Systemization of Knowledge (SoK)
Conflicts: All(IBM Research -Almaden), All(UC Santa Cruz), Alex Uta, Lukas Rupprecht, Tanu Malik, Ivo Jimenez
Potential Reviewers: Ana Trisovic, George Thiruvathukal, Mohammad Akhlaghi, Sebastian Schelter
6 Replies

Loading