A Computationally Efficient Case-Control Sampling Framework for G-Formula with Longitudinal Data

ICLR 2026 Conference Submission22123 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Causal inference, time-varying treatment, survival analysis, rare outcomes, case-control sampling
TL;DR: We propose a case-control enhanced g-formula approach to efficiently estimate causal effects of time-varying treatments on rare survival outcomes.
Abstract: Estimating the causal effect of time-varying treatments on survival outcomes in large observational studies is computationally demanding, particularly when outcomes are rare. The iterative conditional expectation (ICE) estimator within the g-formula framework is effective but becomes computationally burdensome when bootstrapping is used for variance estimation. Additionally, the rarity of outcomes at each time point can create extreme class imbalance, leading to instability and convergence issues in logistic regression and related models. To address these challenges, we propose a novel case-control enhanced g-formula approach, which integrates case-control sampling with ICE estimation. This approach significantly reduces computational burden while maintaining consistency and improving estimation stability. By strategically selecting informative subsets of data and applying appropriate reweighting, the approach mitigates class imbalance, substantially reduces computational cost, and preserves consistency and asymptotic efficiency. We evaluate the method through simulations and validate it using a large-scale EHR cohort study on social and behavioral determinants of health (SBDH) and suicide risk, demonstrating its effectiveness for modeling rare outcomes in longitudinal data.
Supplementary Material: zip
Primary Area: causal reasoning
Submission Number: 22123
Loading