Predicting the understandability of computational notebooks through code metrics analysis

Published: 01 Jan 2025, Last Modified: 22 May 2025Empir. Softw. Eng. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Computational notebooks have become the primary coding environment for data scientists. Despite their popularity, research on the code quality of these notebooks is still in its infancy, and the code shared in these notebooks is often of poor quality. Considering the importance of maintenance and reusability, it is crucial to pay attention to the understandability of the notebook code and identify the notebook metrics that play a significant role in its understandability. The level of code understandability is a qualitative variable closely associated with the user’s opinion about the code. Traditional approaches to measuring it either use limited questionnaires to review a few code pieces or rely on metadata such as likes and votes in software repositories. In our approach, we enhanced the measurement of the understandability level of Jupyter notebooks by leveraging user opinions related to code understandability within a software repository. As a case study, we started with 542,051 Kaggle Jupyter notebooks, compiled in a dataset named DistilKaggle, which we introduced in our previous research. To identify user comments associated with code understandability, we utilized a fine-tuned DistilBERT transformer. We established a user-opinion-based criterion for measuring code understandability by considering the number of code understandability-related comments, the upvotes on those comments and the total views of the notebook received by the notebook. We refer to this criterion as User Opinion Code Understandability (UOCU), which has been proven to be much more effective than previous approaches. A hybrid approach combining UOCU with total upvotes further improved this criterion. Additionally, we trained machine learning models to classify notebook understandability solely based on notebook metrics. We collected 34 metrics for 132,723 final notebooks using the hybrid approach criterion. Our predictive model, built using a Random Forest classifier, achieved 89% accuracy in classifying code understandability levels in computational notebooks.
Loading