Reproducing Machine Learning Research on Binder

Jessica Forde, Matthias Bussonnier, Félix-Antoine Fortin, Brian Granger, Tim Head, Chris Holdgraf, Paul Ivanov, Kyle Kelley, M Pacer, Yuvi Panda, Fernando Perez, Gladys Nalvarte, Benjamin Ragan-Kelley, Zachary Sailer, Steven Silvester, Erik Sundell, Carol Willing

Sep 30, 2018 NIPS 2018 Workshop MLOSS Submission readers: everyone
  • Abstract: Binder is an open-source project that lets users share interactive, reproducible science. Binder’s goal is to allow researchers to create interactive versions of their code utilizing pre-existing workflows and minimal additional effort. It uses standard configuration files in software engineering to let researchers create interactive versions of code they have hosted on commonly-used platforms like GitHub. Binder’s underlying technology, BinderHub, is entirely open-source and utilizes entirely open-source tools. By leveraging tools such as Kubernetes and Docker, it manages the technical complexity around creating containers to capture a repository and its dependencies, generating user sessions, and providing public URLs to share the built images with others. BinderHub combines two open-source projects within the Jupyter ecosystem: repo2docker and JupyterHub. repo2docker builds the Docker image of the git repository specified by the user, installs dependencies, and provides various front-ends to explore the image. JupyterHub then spawns and serves instances of these built images using Kubernetes to scale as needed. Because each of these pieces is open-source and uses popular tools in cloud orchestration, BinderHub can be deployed on a variety of cloud platforms, or even on your own hardware.
  • TL;DR: Binder is an open-source project that lets users share interactive, reproducible science.
  • Keywords: reproducibility, open source, machine learning, kubernetes, docker, jupyterhub
0 Replies