Causal Regressions For Unstructured Data

Published: 27 Oct 2023, Last Modified: 12 Dec 2023CRL@NeurIPS 2023 PosterEveryoneRevisionsBibTeX
Keywords: Causal Learning, Riesz Representor, Adversarial Generalized Method of Moments, Instrumental Variables, Generative AI
TL;DR: This paper introduces "RieszIV," a new estimator that tackles challenges in the causal study of unstructured data and endogeneity with observational data.
Abstract: The focus of much recent research in economics and marketing has been (1) to allow for unstructured data in causal studies and (2) to flexibly address the issue of endogeneity with observational data and perform valid causal inference. Directly using machine learning algorithms to predict the outcome variable can help deal with the issue of unstructured data; however, it is well known that such an approach does not perform well in the presence of endogeneity in the explanatory variables. On the other hand, extant methods catered towards addressing endogeneity issues make strong parametric assumptions and hence are incapable of “directly" incorporating high-dimensional unstructured data. In this paper, we propose an estimator, which we term “RieszIV" for carrying out estimation and inference with high-dimensional observational data without resorting to parametric approximations. We demonstrate our estimator exhibits asymptotic consistency and normality under a mild set of conditions. We carry out extensive Monte Carlo simulations with both low-dimensional and high-dimensional unstructured data to demonstrate the finite sample performance of our estimator. Finally, using app downloads and review data for apps on Google Play we demonstrate how our method can be used to conduct inference over counterfactual policies over rich text data. We show how large language models can be used as a viable counterfactual policy generation operator. This represents an important advance in expanding counterfactual inference to complex, real-world settings.
Submission Number: 48