The impact of unsupervised feature selection techniques on the performance and interpretation of defect prediction models

Published: 2025, Last Modified: 22 Jan 2026Autom. Softw. Eng. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The performance and interpretation of a defect prediction model depend on the software metrics utilized in its construction. Feature selection techniques can enhance model performance and interpretation by effectively removing redundant, correlated, and irrelevant metrics from defect datasets. Previous empirical studies have scrutinized the impact of feature selection techniques on the performance and interpretation of defect prediction models. However, most feature selection techniques examined in these studies are primarily supervised. In particular, the impact of unsupervised feature selection (UFS) techniques on defect prediction remains unknown and needs to be explored extensively. To address this gap, we systematically apply 21 UFS techniques to evaluate their impact on the performance and interpretation of unsupervised defect prediction models in binary classification and effort-aware ranking scenarios. Extensive experiments are conducted on the 28 versions from 8 projects using 4 unsupervised models. We observe that: (1) 10–100% of the selected metrics are inconsistent between each pair of UFS techniques. (2) 29–100% of the selected metrics are inconsistent among different software modules. (3) For unsupervised defect prediction models, some UFS techniques (e.g., AutoSpearman, LS, and FMIUFS) exhibit the ability to effectively reduce the number of metrics while maintaining or even improving model performance. (4) UFS techniques alter the ranking of the top 3 groups of metrics in defect models, affecting the interpretation of these models. Based on these findings, we recommend that software practitioners utilize UFS techniques for unsupervised defect prediction. However, caution should be exercised when deriving insights and interpretations from defect prediction models.
Loading