M$^3$-Impute: Mask-guided Representation Learning for Missing Value Imputation

Zhongyi Yu; Zhenghao Wu; Shuhan Zhong; Weifeng Su; S.-H. Chan; Chul-Ho Lee; Weipeng Zhuo

M$^3$-Impute: Mask-guided Representation Learning for Missing Value Imputation

Zhongyi Yu, Zhenghao Wu, Shuhan Zhong, Weifeng Su, S.-H. Chan, Chul-Ho Lee, Weipeng Zhuo

12 May 2024 (modified: 06 Nov 2024)Submitted to NeurIPS 2024EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Missing Value Imputation, Graph Representation Learning, Data Correlations

TL;DR: We introduce M$^3$-Impute, a mask-guided representation learning method for missing value imputation.

Abstract: Missing values are a common problem that poses significant challenges to data analysis and machine learning. This problem necessitates the development of an effective imputation method to fill in the missing values accurately, thereby enhancing the overall quality and utility of the datasets. Existing imputation methods, however, fall short of considering the 'missingness' information in the data during initialization and modeling the entangled feature and sample correlations explicitly during the learning process, thus leading to inferior performance. We propose M$^3$-Impute, which aims to leverage the missingness information and such correlations with novel masking schemes. M$^3$-Impute first models the data as a bipartite graph and uses an off-the-shelf graph neural network, equipped with a refined initialization process, to learn node embeddings. They are then optimized through M$^3$-Impute’s novel feature correlation unit (FCU) and sample correlation unit (SCU) that enable explicit consideration of feature and sample correlations for imputation. Experiment results on 15 benchmark datasets under three different missing patterns show the effectiveness of M$^3$-Impute by achieving 13 best and 2 second-best MAE scores on average.

Supplementary Material: zip

Primary Area: Other (please use sparingly, only use the keyword field for more details)

Submission Number: 5431

Loading