Enhancing Treatment Effect Estimation with Generation-Driven Data Augmentation

Julianna Piskorz; Nicolas Huynh; Jeroen Berrevoets; Max Ruiz Luyten; Mihaela van der Schaar

Enhancing Treatment Effect Estimation with Generation-Driven Data Augmentation

Julianna Piskorz, Nicolas Huynh, Jeroen Berrevoets, Max Ruiz Luyten, Mihaela van der Schaar

26 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: treatment effect estimation, causal inference, data augmentation, generative models

TL;DR: We propose a framework leveraging generative models to augment observational datasets with synthetic potential outcomes, addressing the challenge of covariate shift inherent in CATE estimation.

Abstract: We introduce $\texttt{GATE}$, a framework for improving the estimation of conditional average treatment effects (CATE) from observational data. Our framework leverages generative models to selectively augment datasets with synthetic potential outcomes, thus addressing the covariate shift problem inherent in CATE estimation. Crucially, $\texttt{GATE}$ enables the integration of external knowledge into downstream CATE models, by leveraging generative models trained on external data sources, such as large language models (LLMs). These models utilise rich contextual information, such as dataset metadata, to generate synthetic potential outcomes grounded in real-world contexts. While generative models can introduce bias when imperfect, we theoretically demonstrate that restricting augmentation to a carefully chosen subsets of the covariate space can allow to achieve performance gains despite these imperfections. Empirically, $\texttt{GATE}$ instantiated with LLMs consistently improves a wide range of CATE estimators, narrowing performance gaps between learners and underscoring the advantages of incorporating external knowledge through generative augmentation, particularly in small-sample regimes.

Primary Area: causal reasoning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 7498

Loading