Detecting Code Smells in JavaScript: An Annotated Dataset for Software Quality Analysis

Diego S. Sarafim, Karina Valdivia Delgado, Daniel Cordeiro

Published: 01 Jan 2024, Last Modified: 09 Oct 2025SBES 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Resumo The source code quality level attained during the development phase is an important factor in increasing costs in later stages of software development. Among the most detrimental quality problems are code smells, which are violations of both programming principles and good practices that negatively affect the maintainability and evolution of computer programs. Much effort has been put into creating tools for code smell detection over the last decades. A promising approach relies on machine learning (ML) algorithms for automated smell detection. Those algorithms usually need datasets with labeled instances pointing to the presence/absence of smells in programming constructs such as classes and methods. Despite a good number of studies using ML for code smell detection, there is a lack of studies adopting this approach for programming languages other than Java. Even widely popular languages like JavaScript have few or no studies covering the usage of ML models for smell detection despite lexical, structural, and paradigm differences when compared to Java. A symptom of the lack of such studies in JavaScript is the absence of standard code smell datasets for this language in the literature. This work presents a new dataset for code smell detection in JavaScript software focused on detecting God Class and Long Method, two of the most prevalent and harmful code smells. We describe the strategy used for the dataset construction, its characteristics, and a few preliminary experiments using our dataset, along with ML models for code smell detection.