Exploratory subgroup detection of psychological networks: Assessing the impact of ordinal and skewed data
DOI: 10.64028/ywzl276136
Keywords: model-based clustering, ordinal data, network analysis, subgroup detection, Gaussian graphical models
TL;DR: In this simulation study, we explore the impact of ordinal and skewed data on the performance of a model-based clustering method potentially suited for psychological network data.
Abstract: Exploratory subgroup identification can be a valuable tool for psychological network science, e.g., to identify patient subgroups with distinct symptom constellations in mental disorders. Gaussian mixture modeling (GMM) – a popular method for investigating heterogeneity in multivariate data – offers a promising avenue to achieve this. GMM approaches allow participants to be clustered into subgroups based on their subgroup-specific network structures, rather than symptom profiles or sumscores. Recent advancements in graphical GMM approaches were extended to explicitly consider the structure of associations among variables within each cluster \parencite[e.g., ][]{fop2019}. By introducing a graph structure search step into the expectation–maximization (EM) algorithm, it allows for not only optimizing parameters but also graph edge sets. However, this approach assumes continuous, normally distributed data, whereas real-world psychological data is often ordinal and/or skewed in nature. In this study, we seek to explore how effectively the structural EM algorithm is able to recover underlying subgroups in data under conditions frequently encountered in psychological data. To this end, we generate cross-sectional data stemming from 3 subgroups with different degrees of network sparsity, echoing findings from previous network analyses of psychological disorders. By varying the cluster proportions, the number of ordinal answer categories, and variable skewness in the simulated datasets, we evaluate the performance of graphical GMM in terms of clustering and structure recovery. Classification goodness, as well as recovery of the true cluster proportions, edge sets, and weight estimates are used as performance indicators.
Supplementary Material: zip
Submission Number: 21
Loading