DSWorkFlow: A Framework for Capturing Data Scientists' Workflows

Published: 01 Jan 2021, Last Modified: 15 Oct 2024CHI Extended Abstracts 2021EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: While machine learning algorithms continue to improve, their success often relies upon the data scientists’ ability to detect patterns, determine useful features and visualizations, select good models, and evaluate and iterate upon results. Data scientists often spend a long time making very little progress as they struggle to determine how to proceed. In this respect, the understanding of data scientists’ workflows and challenges has recently attracted a great deal of scholarly interest. However, the literature is mostly based on interviews and qualitative research methodologies. With this in mind, we developed DSWorkFlow, a data collection framework that provides researchers with the ability to observe and analyze data scientists’ cognitive workflows as they develop predictive models. Using DSWorkFlow, researchers can collect data from a Jupyter Notebook, to reconstruct the code execution order and extract relevant information about data scientist workflow alongside the concomitant collection of qualitative data. We tested the framework experimentally with seven data scientists as they each created three machine learning models to inform our extraction algorithms.
Loading