The Final-Stage Bottleneck: A Systematic Dissection of the R-Learner for Network Causal Inference

TMLR Paper6646 Authors

25 Nov 2025 (modified: 07 Dec 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: The R-Learner is a powerful, theoretically-grounded framework for estimating heterogeneous treatment effects, prized for its robustness to nuisance model errors. However, its application to network data—where causal heterogeneity may be driven by graph structure—presents critical and underexplored challenges to its core assumption of a well-specified final-stage model. In this paper, we conduct a large-scale, multi-seed empirical study to systematically dissect the R-Learner framework on graphs. Our results suggest that for network-dependent effects, a critical driver of performance is the inductive bias of the final-stage CATE estimator, a factor whose importance can dominate that of the nuisance models. Our central finding is a systematic quantification of a "representation bottleneck": we demonstrate empirically and through a constructive theoretical example that graph-blind final-stage estimators, being theoretically misspecified, exhibit significant under-performance (MSE > 4.0, p < 0.001 across all settings). Conversely, we show that an R-Learner with a correctly specified, end-to-end graph-aware architecture (the "Graph R-Learner") achieves a significantly lower error. Furthermore, we provide a comprehensive analysis of the framework’s properties. We identify a subtle "nuisance bottleneck" and provide a mechanistic explanation for its topology dependence: on hub-dominated graphs, graph-blind nuisance models can partially capture concentrated confounding signals, while on graphs with diffuse structure, a GNN’s explicit aggregation becomes critical. This is supported by our analysis of a "Hub-Periphery Tradeoff," which we connect to the GNN over-squashing phenomenon. Our findings are validated across diverse synthetic and semi-synthetic benchmarks, where the R-Learner framework also significantly outperforms a strong, non-DML GNN T-Learner baseline. We release our code as a comprehensive and reproducible benchmark to facilitate future research on this critical "final-stage bottleneck."
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Jiwei_Zhao1
Submission Number: 6646
Loading