Subgrouping Causal Networks of Disease Onset in Large-scale Health and Medical Data using Supercomputer Fugaku
Abstract: Bayesian networks can deduce statistical causal relationships from observed data. When applied to a large-scale health and medical dataset, it becomes feasible to employ the deduced networks to identify potential factors related to disease onset. Factors contributing to the onset of lifestyle-related diseases, such as the social environment and habits, vary significantly among individuals. Thus, it can be hypothesized that networks illustrating disease onset mechanisms would also exhibit substantial diversity. However, typical statistical causal discovery methods challenge the analysis of relationships specific to the sub-groups in a dataset because they use the entire data. In response to this, we use a pattern mining technique for Iwaki Health Promotion Project Health Checkup data to derive subgroups exhibiting strong correlations with the target variables. We estimated the Bayesian networks for the characteristic subgroups out of those derived, and compared them with the Bayesian network estimated for the total (hereafter, base network). Our target was the onset of eight lifestyle-related diseases within three years, resulting in a total of 359 subgroups. By comparing the estimated subgroup networks with the base network, we confirmed the numerous relationships specific to the subgroup networks. These encompassed not only clinically known but also non-trivial relationships. Our approach, which uses target-wise correlation-based rule subgrouping and network estimation is beneficial for constructing hypotheses on the differences in disease onset causes among potential subgroups.
Loading