Exploring Transformer Backbones for Heterogeneous Treatment Effect Estimation
Abstract: Previous works on Treatment Effect Estimation (TEE) are not in widespread use because they are predominantly theoretical, where strong parametric assumptions are made but untractable for practical application. Recent works use Multilayer Perceptron (MLP) for modeling casual relationships, however, MLPs lag far behind recent advances in ML methodology, which limits their applicability and generalizability. To extend beyond the single domain formulation and towards more realistic learning scenarios, we explore model design spaces beyond MLPs, i.e., transformer backbones, which provide flexibility where attention layers govern interactions among treatments and covariates to exploit structural similarities of potential outcomes for confounding control. Through careful model design, Transformers as Treatment Effect Estimators (TransTEE) is proposed. We show empirically that TransTEE can: (1) serve as a general-purpose treatment effect estimator which significantly outperforms competitive baselines on a variety of challenging TEE problems (e.g., discrete, continuous, structured, or dosage-associated treatments.) and is applicable to both when covariates are tabular and when they consist of structural data (e.g., texts, graphs); (2) yield multiple advantages: compatibility with propensity score modeling, parameter efficiency, robustness to continuous treatment value distribution shifts, explainable in covariate adjustment, and real-world utility in auditing pre-trained language models.
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: 1. Writing Clarity and Precision: We have carefully reviewed the manuscript and rephrased statements that were previously deemed overly assertive or lacked precision. The writing has been refined to ensure that our claims are accurately represented and well-supported by evidence. 2. Adversarial Objective Enhancement: We have revisited the adversarial objective used in our approach and made improvements to address the concerns raised by one of the reviewers. The revised objective is better explained and the rationale behind its formulation is clarified. 3. Experiments Section Improvement: The experiments section has been updated based on the reviewer's suggestions. We have provided more comprehensive details about the experiments, including a clearer description of the setup, datasets, and evaluation metrics. Additionally, we have addressed any ambiguity in the presentation of experimental results. 4. Theoretical Justification Clarification: The theoretical justification for the proposed loss function has been thoroughly revised to improve clarity and eliminate any potential confusion. We have restructured the theory section to provide a more intuitive and coherent explanation of our approach, incorporating suggestions from the reviewer. 5. Notation and Methods Clarification: The notation used throughout the paper has been refined to ensure consistency and eliminate any ambiguities. We have provided more intuitive explanations of key concepts and methods, making them accessible to a wider audience. 6. Acknowledgment of Limitation: We have acknowledged the reliance on the ignorability assumption as a limitation of our model. This acknowledgment is accompanied by a discussion of the implications of this limitation on the applicability and interpretation of our proposed method.
Assigned Action Editor: ~Tie-Yan_Liu1
Submission Number: 966