SemMatch: Semantics-Aware Matching for Causal Inference over Knowledge Graphs

Published: 2024, Last Modified: 24 Jul 2025WISE (2) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Causal inference is used in various domains such as healthcare, economics, and political science to infer causal effects from observational data where each unit (entity) has different properties. Existing approaches often assume data completeness, and thus exclude all units with incomplete data when performing causal inference, which can lead to inaccurate causal estimates. In addition, existing approaches follow the Close World Assumption, where facts not present in the database are assumed to be false, limiting the ability to reason under data incompleteness assumption. Knowledge graphs (KGs) are data structures that represent data in semi-structured formats and model the meaning of data via ontologies. We propose a method, SemMatch, based on KGs to enhance causal inference under a data incompleteness assumption.SemMatch relies on a semantic reasoning process specified by a set of logical rules over KGs, to infer implicit facts and partially address data incompleteness. Then, SemMatch applies machine learning methods to estimate the importance of properties. Finally, SemMatch employs causal estimation methods that consider property importance, facilitating causal reasoning across units with incomplete data to determine the causal effect. We evaluate SemMatch on synthetic datasets, and demonstrate that it achieves a lower mean absolute error (MAE) and square root of precision in estimation of heterogeneous effect (PEHE) in causal effect estimation compared to existing state-of-the-art methods. Observed results suggest that accounting for semantic reasoning and including units with incomplete data improves causal estimation accuracy.
Loading