Systematic Unmeasured Confounder Discovery in Observational Pharmacovigilance: A Large Language Model Framework for Enhanced Causal Inference
Keywords: causal inference, unmeasured confounding, large language models, pharmacovigilance, comparative effectiveness research, clinical decision support
TL;DR: We propose an LLM-based framework that leverages both structured and unstructured clinical data to enable scalable causal inference in drug safety research.
Abstract: Background: Unmeasured confounding represents the fundamental limitation of observational pharmacovigilance studies, with traditional approaches relying on labor-intensive manual chart review or limited structured data extraction. We developed and validated a systematic framework using large language models (LLMs) to discover clinical confounders embedded in unstructured clinical narratives, addressing the scalability crisis in causal inference for drug safety research.
Methods: We implemented a comprehensive LLM-based confounder discovery framework using GPT-4o-mini with the MIMIC-IV database (2008–2019). Our systematic approach included: (1) temporal reasoning protocols to distinguish pre-treatment confounders from treatment-induced conditions, (2) comprehensive clinical definitions enabling detection of complex comorbidity relationships, (3) conservative error handling to minimize false-positive confounding, and (4) multi-dimensional validation ensuring clinical accuracy. We demonstrated the framework using vancomycin–piperacillin/tazobactam (VPT) combination therapy as a proof-of-concept, comparing acute kidney injury risk against vancomycin monotherapy in 90,327 patients.
Results: The LLM framework achieved systematic confounder discovery with propensity score discrimination improvement (AUC: 0.562 vs 0.585) and enhanced covariate balance after inverse probability weighting (mean absolute SMD: 0.089 vs 0.018). Time-to-event analysis revealed VPT combination significantly increased AKI risk: IPTW hazard ratio 1.40 (95% CI: 1.35–1.45) versus baseline approach HR 1.44 (95% CI: 1.39–1.49). Bootstrap analysis confirmed framework precision improvement with mean log-HR difference of –0.028 (95% CI: –0.035 to –0.021, p < 0.001). E-value analysis (2.15) indicated robustness to unmeasured confounding.
Conclusions: This systematic LLM framework addresses the unmeasured confounding limitation that has constrained observational pharmacovigilance research for decades. The approach enables immediate scaling to multi-drug comparative effectiveness studies, supports development of personalized risk assessment algorithms, and provides a reproducible methodology for systematic confounder discovery across therapeutic domains.
Supplementary Material: zip
Submission Number: 81
Loading