Heterogeneous Treatment Effect in Time-to-Event Outcomes: Harnessing Censored Data with Recursively Imputed Trees
TL;DR: A non-parametric method for estimating heterogeneous treatment effects in survival data, overcoming prior limitations in heavy censoring and confounding. We demonstrate its superiority in both no-hidden-confounders and instrumental variable settings.
Abstract: Tailoring treatments to individual needs is a central goal in fields such as medicine. A key step toward this goal is estimating Heterogeneous Treatment Effects (HTE)—the way treatments impact different subgroups. While crucial, HTE estimation is challenging with survival data, where time until an event (e.g., death) is key. Existing methods often assume complete observation, an assumption violated in survival data due to right-censoring, leading to bias and inefficiency. Cui et al. (2023) proposed a doubly-robust method for HTE estimation in survival data under no hidden confounders, combining a causal survival forest with an augmented inverse-censoring weighting estimator. However, we find it struggles under heavy censoring, which is common in rare-outcome problems such as Amyotrophic lateral sclerosis (ALS). Moreover, most current methods cannot handle instrumental variables, which are a crucial tool in the causal inference arsenal. We introduce Multiple Imputation for Survival Treatment Response (MISTR), a novel, general, and non-parametric method for estimating HTE in survival data. MISTR uses recursively imputed survival trees to handle censoring without directly modeling the censoring mechanism. Through extensive simulations and analysis of two real-world datasets—the AIDS Clinical Trials Group Protocol 175 and the Illinois unemployment dataset we show that MISTR outperforms prior methods under heavy censoring in the no-hidden-confounders setting, and extends to the instrumental variable setting. To our knowledge, MISTR is the first non-parametric approach for HTE estimation with unobserved confounders via instrumental variables.
Lay Summary: Which cancer treatment would lead to longer survival for each patient? Knowing this would allow us to select the most effective treatment per individual, optimizing outcomes. However, estimating treatment effects is challenging because observed data typically show outcomes for only one treatment per person. Moreover, when the outcome is time until an event like patient death, data are often incomplete due to “right-censoring” – for instance, when subjects drop out or monitoring time is limited.
To address this, we developed MISTR, a new method that constructs multiple datasets with plausible imputations for missing event times. It produces estimates using each imputed dataset and combines them for accurate, personalized treatment effect estimates. Furthermore, most existing approaches require detailed patient characteristics, which are sometimes unavailable. MISTR can bypass this by leveraging “instrumental variables,” a powerful approach for estimating treatment effects without full patient data. We tested MISTR using simulations and real-world medical and economic datasets, demonstrating its superior performance over existing methods, especially under challenging conditions like heavy censoring and unobserved crucial individual characteristics.
Link To Code: https://github.com/tomer1812/mistr
Primary Area: General Machine Learning->Causality
Keywords: Heterogeneous treatment effect, Causal inference, Time to event data, Survival analysis, Instrumental variable, Rare disease, Multiple imputations
Submission Number: 12947
Loading