A Generic Family of Graphical Models: Diversity, Efficiency, and Heterogeneity

Yufei Huang; Changhu Wang; Junjie Tang; Weichi Wu; Ruibin Xi

A Generic Family of Graphical Models: Diversity, Efficiency, and Heterogeneity

Yufei Huang, Changhu Wang, Junjie Tang, Weichi Wu, Ruibin Xi

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Traditional network inference methods, such as Gaussian Graphical Models, which are built on continuity and homogeneity, face challenges when modeling discrete data and heterogeneous frameworks. Furthermore, under high-dimensionality, the parameter estimation of such models can be hindered by the notorious intractability of high-dimensional integrals. In this paper, we introduce a new and flexible device for graphical models, which accommodates diverse data types, including Gaussian, Poisson log-normal, and latent Gaussian copula models. The new device is driven by a new marginally recoverable parametric family, which can be effectively estimated without evaluating the high-dimensional integration in high-dimensional settings thanks to the marginal recoverability. We further introduce a mixture of marginally recoverable models to capture ubiquitous heterogeneous structures. We show the validity of the desirable properties of the models and the effective estimation methods, and demonstrate their advantages over the state-of-the-art network inference methods via extensive simulation studies and a gene regulatory network analysis of real single-cell RNA sequencing data.

Lay Summary: To understand how variables interact in complex systems —like how genes influence each other during biological processes — scientists often build networks that map out these relationships. Traditional methods for building these networks assume the data are continuous and come from a single population. But in reality, especially in modern biological studies like single-cell RNA sequencing, the data are often discrete and collected from multiple groups, such as different cell types. Moreover, when the number of variables is very large, traditional methods become computationally intensive and slow. Our research introduces a new statistical framework to tackle these challenges. We designed a model that can handle different kinds of data — including discrete data and mixtures from different populations. To make these models efficient to use, we developed a novel estimation method that avoids the heavy computations typically required in high-dimensional analysis. We show that our method works reliably across many tests and outperforms current state-of-the-art tools.

Primary Area: Probabilistic Methods->Graphical Models

Keywords: High-dimensional graphical model, Maximum marginal likelihood estimation, Marginal recoverability, Mixture model, Gene regulatory network, Single-cell RNA sequencing

Submission Number: 4720

Loading