Multiple Imputation for Missing Data Using Genetic Programming

Published: 2015, Last Modified: 02 Oct 2024GECCO 2015EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Missing values are a common problem in many real world databases. Inadequate handing of missing data can lead to serious problems in data analysis. A common way to cope with this problem is to use imputation methods to fill missing values with plausible values. This paper proposes GPMI, a multiple imputation method that uses genetic programming as a regression method to estimate missing values. Experiments on eight datasets with six levels of missing values compare GPMI with seven other popular and advanced imputation methods on two measures: the prediction accuracy and the classification accuracy. The results show that, in most cases, GPMI not only achieves better prediction accuracy, but also better classification accuracy than the other imputation methods.
Loading