Genetic Programming for Imputation Predictor Selection and Ranking in Symbolic Regression with High-Dimensional Incomplete Data

Published: 01 Jan 2019, Last Modified: 20 Nov 2024Australasian Conference on Artificial Intelligence 2019EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Incompleteness is one of the challenging issues in data science. One approach to tackle this issue is using imputation methods to estimate the missing values in incomplete data sets. In spite of the popularity of adopting this approach in several machine learning tasks, it has been rarely investigated in symbolic regression. In this work, a genetic programming (GP) based feature selection and ranking method is proposed and applied to high-dimensional symbolic regression with incomplete data. The main idea is to construct GP programs for each incomplete feature using other features as predictors. The predictors selected by these GP programs are then ranked based on the fitness values of the best constructed GP programs and the frequency of occurrences of the predictors in these programs. The experimental work is conducted on high-dimensional data where the number of features is greater than the number of instances.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview