Keywords: causality, pu learning, machine learning, observational studies
Abstract: In causal inference, access to both treated and control units is essential for estimating treatment effects. When treatment assignment is random, the average treatment effect (ATE) can be estimated by comparing outcomes between groups. In non-randomized settings, techniques adjust for confounding to approximate the counterfactual and recover an unbiased ATE. A common challenge in observational studies is the absence of clearly labeled control units. To address this, we propose positive-unlabeled (PU) learning to identify control units from unlabeled data using only treated (positive) units. We evaluate this approach with simulated and real-world data, generating synthetic data from a causal graph to test the recovery of control groups for accurate ATE estimation. Applied to sustainable agriculture data, PU learning effectively distinguishes control units, enabling ATE estimates that closely match true effects. These results demonstrate PU learning’s potential to enhance causal inference in settings lacking explicit control data. This work has important implications for observational causal inference, especially in fields like Earth, environmental, and agricultural sciences, where randomized experiments are costly and control units may be unavailable.
Submission Number: 173
Loading