Generalizing Trimming Bounds for Endogenously Missing Outcome Data Using Random Forests

Cyrus Samii, Ye Wang, Junlong Aaron Zhou

Published: 26 Jun 2025, Last Modified: 27 Nov 2025Political AnalysisEveryoneRevisionsCC BY-SA 4.0
Abstract: We present a method for narrowing nonparametric bounds on treatment effects by adjusting for potentially large numbers of covariates, using generalized random forests. In many experimental or quasi-experimental studies, outcomes of interest are only observed for subjects who select (or are selected) to engage in the activity generating the outcome. Outcome data are thus endogenously missing for units who do not engage, and random or conditionally random treatment assignment before such choices is insufficient to identify treatment effects. Nonparametric partial identification bounds address endogenous missingness without having to make disputable parametric assumptions. Basic bounding approaches often yield bounds that are wide and minimally informative. Our approach can tighten such bounds while permitting agnosticism about the data-generating process and honest inference. A simulation study and replication exercise demonstrate the benefits.
Loading