Feature Selection for GPSR Based on Maximal Information Coefficient and Shapley Values

Published: 01 Jan 2024, Last Modified: 20 Nov 2024CEC 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Feature selection is a critical aspect of improving the interpretability of machine learning models. Genetic Programming (GP) has a built-in feature selection mechanism that explores the search space to include informative features in models. However, this built-in mechanism is insufficient for identifying important features, when dealing with high-dimensional feature spaces. To overcome this limitation, the paper introduces a novel feature importance measurement based on the Maximal Infor-mation Coefficient and Shapley Values. The proposed algorithm operates in two stages. In the first stage, it identifies the best individuals from different populations. In the second stage, the best individuals from the first stage are utilized for the calculation of the novel individual feature importance measurement. The new feature importance measurement offers valuable insights into the significance and relevance of the selected features. Regression experiments were conducted on six datasets to assess the effectiveness of the proposed method. Furthermore, comparisons were made with two other algorithms to evaluate its performance. The results indicate that the proposed approach enhances GP performance for high dimensional datasets while maintaining GP trees of similar size compared to standard GP.
Loading