Learning Gaussian DAG Models without Condition Number Bounds

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We provide algorithms for learning Gaussian DAG Models with sample complexity independent of the condition number.
Abstract: We study the problem of learning the topology of a directed Gaussian Graphical Model under the equal-variance assumption, where the graph has $n$ nodes and maximum in-degree $d$. Prior work has established that $O(d \log n)$ samples are sufficient for this task. However, an important factor that is often overlooked in these analyses is the dependence on the condition number of the covariance matrix of the model. Indeed, all algorithms from prior work require a number of samples that grows polynomially with this condition number. In many cases this is unsatisfactory, since the condition number could grow polynomially with $n$, rendering these prior approaches impractical in high-dimensional settings. In this work, we provide an algorithm that recovers the underlying graph and prove that the number of samples required is independent of the condition number. Furthermore, we establish lower bounds that nearly match the upper bound up to a $d$-factor, thus providing an almost tight characterization of the true sample complexity of the problem. Moreover, under a further assumption that all the variances of the variables are bounded, we design a polynomial-time algorithm that recovers the underlying graph, at the cost of an additional polynomial dependence of the sample complexity on $d$. We complement our theoretical findings with simulations on synthetic datasets that confirm our predictions.
Lay Summary: We study the problem of learning a specific type of Bayesian network, a statistical model used to model the cause-and-effect relationships in various areas. There are prior works that tackle this problem, but the number of samples used by prior works depends on the condition number (the ratio of the largest to the smallest eigenvalue) of the covariance matrix. This condition number bounds are not optimal, so we provide a new algorithm that finds the Bayesian network to reduce the number of samples. We prove that the number of samples our algorithm requires is independent of the condition number. Additionally, we provide examples to show that the upper bound of the samples complexity of our algorithm is almost tight. Moreover, under certain additional assumptions, we design a polynomial-time algorithm that recovers the underlying graph, which does not depend on the condition number. Finally, we run simulations on synthetic datasets that confirm our predictions.
Primary Area: Deep Learning->Robustness
Keywords: Bayesian network, graphical model, Gaussian DAG models, condition number
Submission Number: 12346
Loading