A Rule-Guided Community Detection Method for Identifying Subpopulations in Medical Data

Published: 2025, Last Modified: 21 Jan 2026IEEE J. Biomed. Health Informatics 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Precisely identifying and explaining subpopulations in heterogeneous populations is essential to understanding the disease subtype. Using community detection to identify subpopulations is a promising way. However, there remains an issue in the existing community detection: Current methods for identifying subpopulations in medical data rely solely on separate attribute values, ignoring the important association rules between attribute values. Association rules are crucial in medical diagnosis to determine disease subtypes. Thus, We propose a rule-guided community detection (RGCD) method for precisely identifying homogeneous subpopulations. Specifically, the RGCD incorporates association rules into the original network, thereby constructing an augmented network. It proves that decomposing the embedding vectors obtained from biased random walks on the augmented network is equivalent to decomposing the transition probability matrix. Based on this proof, we enhance the transition probability matrix through rule-guided biased random walks, resulting in the rule-augmented matrix. By performing matrix decomposition and clustering on this matrix, we achieve precise identification of subpopulations. To the best of our knowledge, this is the first work that introduces the incorporation of association rules into community detection. Extensive experiments on 10 real-world datasets from medical fields fully show that the RGCD is more competitive than six state-of-the-art community detection methods. The weighted F1 of RGCD increases by up to 22.62%, compared to the best existing community detection methods. Furthermore, We provide a qualitative depiction of the subpopulations obtained through RGCD and acquire medically significant insights.
Loading