CAUSALPERT: GROUNDING LLM HYPOTHESES IN REGULATORY NETWORKS FOR GENE PERTURBATION PREDICTION

Published: 02 Mar 2026, Last Modified: 03 Jun 2026MLGenX 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Predicting transcriptional responses to unseen genetic perturbations is essential for understanding gene regulation and prioritizing large-scale perturbation experiments. Existing approaches either rely on static, potentially incomplete knowledge graphs, or prompt language models for functionally similar genes, retrieving associations shaped by symmetric co-occurrence in scientific text rather than directed regulatory logic. We introduce CausalPert, a lightweight framework that encourages LLM agents to generate directed regulatory hypotheses rather than relying solely on functional similarity. Multiple agents independently propose candidate regulators with associated confidence scores; these are aggregated through a consensus mechanism that filters spurious associations, producing weighted neighborhoods for downstream prediction. We evaluate CausalPert on Perturb-seq benchmarks across four human cell lines. For perturbation prediction in low-data regimes ($N=50$ observed perturbations), CausalPert improves Pearson correlation by up to 10.5\% over similarity-based baselines. For experimental design, CausalPert-selected anchor genes outperform standard network centrality heuristics by up to 46\% in well-characterized cell lines.
Submission Number: 109
Loading