Exact Statistical Tests for Gene Regulatory Network Discovery from Single-Cell RNA Sequencing

NeurIPS 2025 Workshop CauScien Submission57 Authors

01 Sept 2025 (modified: 18 Oct 2025)Submitted to NeurIPS 2025 Workshop CauScienEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Gene Regulatory Networks, Single-Cell RNA Sequencing, Contrastive Learning, Bayesian Inference, Network Biology
Abstract: Gene regulatory networks encode causal relationships between transcription factors and target genes, but inferring these networks from single-cell RNA sequencing data faces extreme sparsity and class imbalance challenges. We present a framework using exact statistical tests to evaluate whether predicted regulatory edges are enriched above background rates in the top-ranked predictions where experimental validation would focus. This approach moves beyond global metrics to assess performance where it matters for practical discovery. Using our scoring method, we demonstrate strong performance across two evaluations. On human Crohn disease lamina propria data, the held-out regulatory interaction ranks first among a large number of candidate edges, with Fisher's exact test yielding significant p-values for enrichment in top predictions. Curated positive edges receive mean posterior probability 0.908 versus 0.0054 for random negatives. Across 42 BEELINE benchmark datasets, we achieve mean ROC-AUC 0.926 and mean precision 33.5\% in the top 100 predictions (47-fold improvement over random selection). Enrichment tests confirm statistical significance on all 42 datasets. These results show that exact statistical tests provide actionable evidence for network discovery, offering practical guidance for experimental validation while maintaining statistical rigor for structure learning from noisy single-cell data.
Submission Number: 57
Loading