Abstract: Causal inference from observational data is a cornerstone of decision-making in healthcare, economics, and the social sciences. While deep learning has significantly advanced effect estimation, standard architectures often fail to respect the structural constraints inherent in causal systems, leading to biased results in complex scenarios like proximal inference. In this paper, we introduce the \textbf{DAG-aware Graph Attention Network (GAT)}, a novel neural framework that bridges structural causal modeling with graph representation learning. Unlike traditional Transformers or unconstrained GNNs, our model embeds the causal Directed Acyclic Graph (DAG) as a hard structural inductive bias directly into the attention mechanism. This ensures that information flow strictly adheres to valid causal pathways while preserving the semantic integrity of heterogeneous variables by omitting distortive normalization layers. Extensive experiments on several benchmark datasets show that the DAG-aware GAT consistently outperforms classical non-parametric baselines, modular MLPs, and causally-agnostic graph architectures. By prioritizing causal integrity over generic predictive heuristics, our approach provides a robust and interpretable foundation for reasoning from complex observational data.
Submission Type: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: # Changes Since Last Submission
In response to the reviewers' constructive feedback, we have significantly revised the manuscript to clarify our technical contributions and strengthen the empirical evaluation. The key changes are summarized below:
* **Architectural Reframing (GAT vs. Transformer):** We have reframed our model as a **DAG-aware Graph Attention Network (GAT)**. This terminology more accurately reflects the architecture's use of node-based representations and weighted message passing restricted by an adjacency matrix, rather than a sequence-based Transformer.
* **Expanded Empirical Baselines:** We have updated **Table 1** to include two critical new baselines:
1. A **Standard GNN** (fully-connected) to demonstrate the necessity of the causal DAG constraint.
2. A **DAG-constrained Transformer with Layer Normalization** to empirically validate our finding that LayerNorm biases the estimation of heterogeneous causal variables.
* **Structural Misspecification Analysis:** We added a new section and analysis (**Section 5 / Table 2**) using the *Demand* dataset to evaluate the model's sensitivity to errors in the causal prior (e.g., reversed or missing edges). We discuss how structural misspecification acts as a "ceiling" for performance, particularly at larger sample sizes.
* **Contextualization and Literature:** We incorporated insights from **Kompa et al. (2022)** to clarify the distinction between modular and unified causal estimation. Unlike the framework in Kompa et al., which respects the DAG by separately estimating bridge functions through independent MLP-based objectives, our **DAG-aware GAT** provides a unified architecture that embeds the entire causal topology directly into the neural message-passing mechanism.
* We added a discussion on the synergy between our framework and emerging **Causal Foundation Models** (e.g., SEA), positioning our model as a robust estimation engine that can follow automated structural discovery.
Assigned Action Editor: ~Amit_Sharma3
Submission Number: 6254
Loading