Keywords: causal inference, composite model, heterogeneous-indexed data, graphical causal models
TL;DR: Composite Graphical Causal Models (CGCMs) preserves heterogeneous data indexing by embedding resampling and aggregation within interconnected causal models, eliminating preprocessing and improving predictive accuracy and interpretability.
Abstract: Complex real-world systems typically consist of multiple interdependent subsystems, where each subsystem can operate under different sampling references. Consequently, the data collected across these subsystems vary in indexing (e.g., time-, distance-, event-indexed), sampling frequency, or faces index misalignment. To approximately model such kind of systems, surrogate models can be used to serve as a computationally-inexpensive replacement during optimization, sensitivity analysis, uncertainty quantification, or interpretation. Conventional modeling approaches require these datasets to be unified into a single, uniformly-indexed table via preprocessing steps such as aggregation and merging.
In this work, we introduce a novel approach, Composite Graphical Causal Models (CGCMs), that preserves the original indexing of each data table during both training and inference. By embedding resampling and aggregation operations directly within a GCM, our method eliminates the need for data homogenization as a preprocessing step. Specifically, a set of GCMs is employed each tailored to a distinct indexing, and connected using aggregation functions to model cross-index dependencies. As validated on synthetic datasets, this design enables a more representative modeling of heterogeneous-indexed processes, improving predictive performance and interpretability.
Pmlr Agreement: pdf
Submission Number: 57
Loading