Abstract: Causal inference methods that control for text- based confounders are becoming increasingly important in the social sciences and other dis- ciplines where text is readily available. How- ever, these methods rely on a critical assump- tion that there is no treatment leakage: that is, the text contains only information about the confounder and no information about treat- ment assignment (leading to post-treatment bias). However, this assumption may be un- realistic in real-world situations involving text, as human language is rich and flexible. We first define the leakage problem, discussing the identification and estimation challenges it raises. We also discuss the conditions under which leakage can be addressed by removing the treatment-related signal from the text in a pre-processing step we define as text distilla- tion. Then, using simulation, we investigate the mechanics of treatment leakage on esti- mates of the average treatment effect (ATE)
0 Replies
Loading