Comparative ranking of marginal confounding impact of natural language processing-derived versus structured features in pharmacoepidemiology

Published: 2025, Last Modified: 15 Oct 2025Comput. Biol. Medicine 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•Unstructured clinical notes contain potential confounder information.•Natural language processing can auto-generate features from notes.•We ranked the marginal confounding impact of features on estimated causal effects.•Prespecified knowledge and claims data dominated the top 25 ranked features.•39 % of the top 100 ranked features were natural language processing generated.
Loading