Toward Generalizability of Graph-based Imputation on Bio-Medical Missing Data

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Missing Features, Graph-based Imputation, Tabular data
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Recent work on graph-based imputation methods for missing features has garnered significant attention, largely due to the effectiveness of their ability to aggregate and propagate information through graph structures. However, these methods generally assume that the graph structure is readily available and manually mask the original features to simulate the scenario of missing features. This set of assumptions narrows the applicability of such techniques to real-world tabular data, where graph structure is not readily available and missing data is a prevalent issue, such as in cases involving confidential patient information. In light of this situation, and with the aim of enhancing generalizability, we propose GRASS that bridges the gap between recent graph-based imputation methods and real-world scenarios involving missing data in their initial states. Specifically, our approach begins with tabular data and employs a simple Multi-Layer Perceptron (MLP) layer to extract feature gradient, which serves as an additional resource for generating graph structures. Leveraging these gradients, we construct a graph from a feature (i.e., column) perspective and carry out column-wise feature propagation to impute missing values based on their similarity to other features. Once the feature matrix is imputed, we generate a second graph, but this time from a sample-oriented (i.e., row) perspective, which serves as the input for existing graph-based imputation models. We evaluate GRASS using real-world medical and bio-domain datasets, demonstrating their effectiveness and generalizability in handling versatile missing scenarios.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7215
Loading