Automated Detection of Interpretable Causal Inference Opportunities: Regression Discontinuity Subgroup Discovery

Published: 20 Jun 2023, Last Modified: 19 Jul 2023IMLH 2023 PosterEveryoneRevisionsBibTeX
Keywords: Regression discontinuity, causal inference, compliance estimation, clinical guidelines
TL;DR: We develop an ML method for discovering regression discontinuity opportunities for causal inference in observational clinical data that improves study power.
Abstract: Treatment decisions based on cutoffs of continuous variables, such as the blood sugar threshold for diabetes diagnosis, provide valuable opportunities for causal inference. Regression discontinuities (RDs) are used to analyze such scenarios, where units just above and below the threshold differ only in their treatment assignment status, thus providing as-if randomization. In practice however, implementing RD studies can be difficult as identifying treatment thresholds require considerable domain expertise -- furthermore, the thresholds may differ across population subgroups (e.g., the blood sugar threshold for diabetes may differ across demographics), and ignoring these differences can lower statistical power. Here, we introduce Regression Discontinuity SubGroup Discovery (RDSGD), a machine learning method that identifies more powerful and interpretable subgroups for RD thresholds. Using a claims dataset with over 60 million patients, we apply our method to multiple clinical contexts and identify subgroups with increased compliance to treatment assignment thresholds. As subgroup-specific treatment thresholds are relevant to many diseases, RDSGD can be a powerful tool for discovering new avenues for causal estimation across a range of clinical applications.
Submission Number: 85
Loading