Enabling End-To-End Machine Learning Replicability: A Case Study in Educational Data MiningDownload PDF

Published: 27 Jun 2018, Last Modified: 05 May 2023ICML 2018 RML SubmissionReaders: Everyone
Abstract: The use of machine learning techniques has expanded in education research, driven by the rich data from digital learning environments and institutional data warehouses. However, replication of machine learned models in the domain of the learning sciences is particularly challenging due to a confluence of experimental, methodological, and data barriers. We discuss the challenges of end-to-end machine learning replication in this context, and present an open-source software toolkit, the MOOC Replication Framework (MORF), to address them. We demonstrate the use of MORF by conducting a replication at scale, and provide a complete executable container, with unique DOIs documenting the configurations of each individual trial, for replication or future extension at https://github.com/educational-technology-collective/fy2015-replication. This work demonstrates an approach to end-to-end machine learning replication which is relevant to any domain with large, complex or multi-format, privacy-protected data with a consistent schema.
Keywords: replication, containerization, moocs, machine learning, model selection
TL;DR: We present a tool for replicable machine learning at scale, and an application to a large MOOC dropout prediction experiment.
1 Reply

Loading