everyone
since 04 Oct 2024">EveryoneRevisionsBibTeXCC BY 4.0
We introduce $\texttt{GATE}$, a framework for improving the estimation of conditional average treatment effects (CATE) from observational data. Our framework leverages generative models to selectively augment datasets with synthetic potential outcomes, thus addressing the covariate shift problem inherent in CATE estimation. Crucially, $\texttt{GATE}$ enables the integration of external knowledge into downstream CATE models, by leveraging generative models trained on external data sources, such as large language models (LLMs). These models utilise rich contextual information, such as dataset metadata, to generate synthetic potential outcomes grounded in real-world contexts. While generative models can introduce bias when imperfect, we theoretically demonstrate that restricting augmentation to a carefully chosen subsets of the covariate space can allow to achieve performance gains despite these imperfections. Empirically, $\texttt{GATE}$ instantiated with LLMs consistently improves a wide range of CATE estimators, narrowing performance gaps between learners and underscoring the advantages of incorporating external knowledge through generative augmentation, particularly in small-sample regimes.