- Abstract: The use of machine learning techniques has expanded in education research, driven by the rich data from digital learning environments and institutional data warehouses. However, replication of machine learned models in the domain of the learning sciences is particularly challenging due to a confluence of experimental, methodological, and data barriers. We discuss the challenges of end-to-end machine learning replication in this context, and present an open-source software toolkit, the MOOC Replication Framework (MORF), to address them. We demonstrate the use of MORF by conducting a replication at scale, and provide a complete executable container, with unique DOIs documenting the configurations of each individual trial, for replication or future extension at https://github.com/educational-technology-collective/fy2015-replication. This work demonstrates an approach to end-to-end machine learning replication which is relevant to any domain with large, complex or multi-format, privacy-protected data with a consistent schema.
- Keywords: replication, containerization, moocs, machine learning, model selection
- TL;DR: We present a tool for replicable machine learning at scale, and an application to a large MOOC dropout prediction experiment.