LLM-guided acquisition improves pathway-specific Perturb-seq design under experimental budgets

Published: 28 May 2026, Last Modified: 28 May 2026GenBio 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Active Learning, Large Language Models, Uncertainty estimation, single-cell perturbation, gene perturbation prediction
TL;DR: We propose an LLM-guided acquisition strategy for budgeted Perturb-seq acquisition, selecting a diverse, pathway-relevant batch via biological reasoning and outperforming baselines with interpretable rationales.
Abstract: We study LLM-guided acquisition for budgeted Perturb-seq design, where each perturbation is a costly wet-lab experiment and acquisition strategy is central to sample efficiency. We find that with modern perturbation-response predictors, common acquisition methods based on curated priors or uncertainty often fail to beat random selection. We therefore propose a two-stage strategy for experimental planning: candidates are shortlisted by ensemble-based epistemic uncertainty, then re-ranked by an LLM using pathway context, candidate annotations, and diversity criteria, with a concise rationale per selection. On public Perturb-seq benchmarks, our method achieves the best predictive performance on pathway genes for 5 of 5 evaluated pathways in K562 and 2 of 4 in RPE1; in blinded LLM-as-judge evaluation with three independent judges, our selections are preferred over random and BALD and tie with IterPert, indicating the edge over IterPert is predictive rather than in judged coherence. Ablations show the LLM relies on supplied biological context rather than gene-symbol memorisation, and performance is comparable across two reasoning-capable backbones. Together, these results suggest base LLMs already encode useful structure for Perturb-seq experimental planning, with clear headroom from retrieval, fine-tuning, and richer agentic loops.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 131
Loading