Abstract: We develop a semantics-driven, automated approach for dynamically performing rigorous scientific studies. This framework may be applied to a wide variety of data and study types; here, we demonstrate its suitability for conducting retrospective cohort studies using publicly available population health data. The goal is to identify risk factors that, for some automatically-discovered subpopulation, have significant associations with some health condition. Our semantically-targeted analytics (STA) approach addresses the end-to-end data science workflow, ranging from intelligent data selection to dissemination of derived data and results in a rigorous, reproducible way. STA drives an automated architecture allowing analysts to rapidly and dynamically conduct studies for different health outcomes, risk factors, cohorts, and analysis methods; it also lets the full analysis pipeline be modularly specified in a reusable domain-specific way. The framework developed here maybe readily extended to other learning tasks and datasets in the future.
0 Replies
Loading