everyone
since 04 Oct 2024">EveryoneRevisionsBibTeXCC BY 4.0
Handling missing data is a major challenge in machine learning where missing values are common in datasets. This work introduces a hypergraph representation constructed from datasets containing missing values. The method does not rely on traditional techniques like deletion or data imputations. The approach constructs hypergraphs directly from the dataset, preserving the relationships between variables and modeling multi-variable interactions. This enables the model to capture the dataset structure in ways other methods may overlook. The proposed hypergraph learning method can be applied to classification and regression tasks. For real-world evaluation, we use the MIMIC-III and Adult datasets focusing on classification performance. Additionally, synthetic datasets with controlled missingness are used to evaluate the method's effectiveness across varying degrees of missingness. When compared with imputation and prediction techniques, the hypergraph approach achieves competitive or superior performance. Specifically, our method maintains high performance in scenarios with significant levels of missing data. We demonstrate that the hypergraph representation not only offers a more resilient framework for learning from datasets with missing data. But also scales effectively across diverse datasets and prediction tasks. The method maintains stable performance under various degrees of missingness, demonstrating its potential as a valuable machine learning tool with high data reliability and prediction quality.