A Free Lunch with Influence Functions? An Empirical Evaluation of Influence Functions for Average Treatment Effect Estimation

Published: 01 Apr 2023, Last Modified: 01 Apr 2023Accepted by TMLREveryoneRevisionsBibTeX
Abstract: The applications of causal inference may be life-critical, including the evaluation of vaccinations, medicine, and social policy. However, when undertaking estimation for causal inference, practitioners rarely have access to what might be called `ground-truth' in a supervised learning setting, meaning the chosen estimation methods cannot be evaluated and must be assumed to be reliable. It is therefore crucial that we have a good understanding of the performance consistency of typical methods available to practitioners. In this work we provide a comprehensive evaluation of recent semiparametric methods (including neural network approaches) for average treatment effect estimation. Such methods have been proposed as a means to derive unbiased causal effect estimates and statistically valid confidence intervals, even when using otherwise non-parametric, data-adaptive machine learning techniques. We also propose a new estimator `MultiNet', and a variation on the semiparametric update step `MultiStep', which we evaluate alongside existing approaches. The performance of both semiparametric and `regular' methods are found to be dataset dependent, indicating an interaction between the methods used, the sample size, and nature of the data generating process. Our experiments highlight the need for practitioners to check the consistency of their findings, potentially by undertaking multiple analyses with different combinations of estimators.
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Length: Long submission (more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=E2lP5gsSsF&referrer=%5BTMLR%5D(%2Fgroup%3Fid%3DTMLR)
Changes Since Last Submission: Dear all, Many thanks for your patience. It took quite some time to re-run all simulations with the additional method (DML) and dataset variants. We include an updated version of the paper with the following changes: - Unmarginalised results for a new performance metric (absolute error on the estimation of the ATE): $\sum_{i=0}^{r}|\tau_i - \hat{\tau}_i|$ in the appendix - Unmarginalised results for the double machine learning method in appendix - Marginalized + shap + unmarginalised results for a new, more general dataset 'Gen' with three sample sizes 500, 5000, and 10000 - Updated shap interaction plots (easier to read) - Split the shap interaction variable 'dataset' into the 'dataset' type (LF v1, v2, Gen, IHDP) and 'sample size' (500, 5000, 1000, 747) so that we can disentangle the structural and functional from from the sample size - Updated the discussion of the results (once again, they are still mixed, although the shap interaction plots highlight some particular interactions between certain methods and the sample size) - Included a discussion concerning how practitioners should choose estimators - Clarified a handful of points according to review comments (e.g. what we meant by a 'correctly specified estimator', and how our use of the word 'consistent' depends on the context ) We thank the reviewers and editors for their comments and feedback - it has surely strengthened the paper, particularly in regard to the scope and thoroughness of the empirical evaluation.
Code: https://github.com/matthewvowels1/FreeLunchSemiParametrics/
Assigned Action Editor: ~Mingming_Gong1
Submission Number: 588