CAUSALPERT: GROUNDING LLM HYPOTHESES IN REGULATORY NETWORKS FOR GENE PERTURBATION PREDICTION

Marc Boubnovski Martell; Josefa Lia Stoisser; Lawrence Phillips; Aditya Misra; Robert Kitchen; Jesper Ferkinghoff-Borg; Jialin Yu; Philip Torr; Kaspar Märtens

CAUSALPERT: GROUNDING LLM HYPOTHESES IN REGULATORY NETWORKS FOR GENE PERTURBATION PREDICTION

Marc Boubnovski Martell, Josefa Lia Stoisser, Lawrence Phillips, Aditya Misra, Robert Kitchen, Jesper Ferkinghoff-Borg, Jialin Yu, Philip Torr, Kaspar Märtens

Published: 02 Mar 2026, Last Modified: 17 Apr 2026MLGenX 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Track: Main track

Abstract: Predicting transcriptional responses to unseen genetic perturbations is essential for understanding gene regulation and prioritizing large-scale perturbation experiments. Existing approaches either rely on static, potentially incomplete knowledge graphs, or prompt language models for functionally similar genes, retrieving associations shaped by symmetric co-occurrence in scientific text rather than directed regulatory logic. We introduce CausalPert, a lightweight framework that encourages LLM agents to generate directed regulatory hypotheses rather than relying solely on functional similarity. Multiple agents independently propose candidate regulators with associated confidence scores; these are aggregated through a consensus mechanism that filters spurious associations, producing weighted neighborhoods for downstream prediction. We evaluate CausalPert on Perturb-seq benchmarks across four human cell lines. For perturbation prediction in low-data regimes ($N=50$ observed perturbations), CausalPert improves Pearson correlation by up to 10.5\% over similarity-based baselines. For experimental design, CausalPert-selected anchor genes outperform standard network centrality heuristics by up to 46\% in well-characterized cell lines.

AI Policy Confirmation: I confirm that this submission clearly discloses the role of AI systems and human contributors and complies with the ICLR 2026 Policies on Large Language Model Usage and the ICLR Code of Ethics.

Submission Number: 109

Loading