Data-Driven Variable Decomposition for Treatment Effect Estimation

Kun Kuang, Peng Cui, Hao Zou, Bo Li, Jianrong Tao, Fei Wu, Shiqiang Yang

2022 (modified: 31 Jan 2023)IEEE Trans. Knowl. Data Eng. 2022Readers: Everyone

Abstract: Causal Inference plays an important role in decision making in many fields, such as social marketing, healthcare, and public policy. One fundamental problem in causal inference is the treatment effect estimation in observational studies when variables are confounded. Controlling for confounding effects is generally handled by propensity score. But it treats all observed variables as confounders and ignores the adjustment variables, which have no influence on treatment but are predictive of the outcome. Recently, it has been demonstrated that the adjustment variables are effective in reducing the variance of the estimated treatment effect. However, how to automatically separate the confounders and adjustment variables in observational studies is still an open problem, especially in the scenarios of high dimensional variables, which are common in the big data era. In this paper, we first propose a Data-Driven Variable Decomposition (D <inline-formula><tex-math notation="LaTeX">$^2$</tex-math></inline-formula> VD) algorithm, which can 1) automatically separate confounders and adjustment variables with a data-driven approach, and 2) simultaneously estimate treatment effect in observational studies with high dimensional variables. Under standard assumptions, we theoretically prove that our D <inline-formula><tex-math notation="LaTeX">$^2$</tex-math></inline-formula> VD algorithm can unbiased estimate treatment effect and achieve lower variance than traditional propensity score based methods. Moreover, to address the challenges from high-dimensional variables and nonlinear, we extend our D <inline-formula><tex-math notation="LaTeX">$^2$</tex-math></inline-formula> VD to a non-linear version, namely Nonlinear-D <inline-formula><tex-math notation="LaTeX">$^2$</tex-math></inline-formula> VD (N-D <inline-formula><tex-math notation="LaTeX">$^2$</tex-math></inline-formula> VD) algorithm. To validate the effectiveness of our proposed algorithms, we conduct extensive experiments on both synthetic and real-world datasets. The experimental results demonstrate that our D <inline-formula><tex-math notation="LaTeX">$^2$</tex-math></inline-formula> VD and N-D <inline-formula><tex-math notation="LaTeX">$^2$</tex-math></inline-formula> VD algorithms can automatically separate the variables precisely, and estimate treatment effect more accurately and with tighter confidence intervals than the state-of-the-art methods. We also demonstrated that the top-ranked features by our algorithm have the best prediction performance on an online advertising dataset.

0 Replies