Propagate and Inject: Revisiting Propagation-Based Feature Imputation for Graphs with Partially Observed Features

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY-NC 4.0
TL;DR: For graphs with missing features, we identify a critical limitation of existing propagation-based imputation methods and propose a novel scheme that overcomes this limitation.
Abstract: In this paper, we address learning tasks on graphs with missing features, enhancing the applicability of graph neural networks to real-world graph-structured data. We identify a critical limitation of existing imputation methods based on feature propagation: they produce channels with nearly identical values within each channel, and these low-variance channels contribute very little to performance in graph learning tasks. To overcome this issue, we introduce synthetic features that target the root cause of low-variance channel production, thereby increasing variance in these channels. By preventing propagation-based imputation methods from generating meaningless feature values shared across all nodes, our synthetic feature propagation scheme mitigates significant performance degradation, even under extreme missing rates. Extensive experiments demonstrate the effectiveness of our approach across various graph learning tasks with missing features, ranging from low to extremely high missing rates. Additionally, we provide both empirical evidence and theoretical proof to validate the low-variance problem. The source code is available at https://github.com/daehoum1/fisf.
Lay Summary: Graph neural networks (GNNs) are useful tools that learn from networks, such as social networks, by using information about each node and how they are connected. But in the real world, important information about these nodes is often missing. To fill in the gaps, a common approach is to copy known values from neighboring nodes. However, we discovered that when these known values are very similar, this method ends up giving nearly the same result to every node. As a result, the model struggles to tell nodes apart — like giving every person the same estimated age, making it hard to make meaningful predictions. To solve this problem, we introduce a new method called FISF. This method adds a small amount of variety to overly similar parts of the data and spreads this variation across the network. In doing so, it creates more meaningful differences in the filled-in data. Our experiments show that FISF significantly improves the performance of GNNs, even when most of the original information is missing. This allows machine learning models to make better decisions using incomplete data in a wide range of real-world applications.
Link To Code: https://github.com/daehoum1/fisf
Primary Area: Deep Learning->Graph Neural Networks
Keywords: graph neural networks, graphs, missing features
Submission Number: 12446
Loading