CAUSALPERT: GROUNDING LLM HYPOTHESES IN REGULATORY NETWORKS FOR GENE PERTURBATION PREDICTION

Published: 02 Mar 2026, Last Modified: 02 Mar 2026MLGenX 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Track: Main track
Keywords: LLM, PERTURBATION PREDICTION
Abstract: Predicting gene perturbation effects in unseen contexts is essential for understand- ing regulatory networks and identifying therapeutic targets. Current methods face a trade-off: Graph Neural Networks are limited by incomplete databases, while LLM-based methods confuse textual co-occurrence with true regulatory relation- ships. We introduce CausalPert, a framework that uses LLM consensus to infer directed regulatory relationships, constructing a latent GRN that guides both prediction and experimental design. CausalPert makes two key changes to existing seman- tic baselines for LLM-based perturbation prediction. (1) For predicting unseen perturbation effects, instead of asking an LLM to find ”similar genes,” it prompts the LLM to identify upstream regulators of a target gene, runs this query three times independently, and keeps only the candidates that appear consistently. (2) For selecting which genes to experimentally perturb first, it asks the LLM to nominate the genes most likely to control many regulatory targets, then ranks them by agreement across runs. For perturbation prediction (1), our method improves correlation by 10.5% over semantic baselines in few-shot regimes (N = 50). For experimental design (2), selecting just 50 anchors via LLM consensus in K562 outperforms network cen- trality heuristics by up to 46%.
AI Policy Confirmation: I confirm that this submission clearly discloses the role of AI systems and human contributors and complies with the ICLR 2026 Policies on Large Language Model Usage and the ICLR Code of Ethics.
Submission Number: 109
Loading