Keywords: Large Language Models, Genomics applications, Cellular perturbation prediction
TL;DR: We propose a hybrid model, LangPert, that combines LLMs with kNN to predict unseen gene perturbation effects with state-of-the-art performance
Abstract: Predicting cellular responses to previously unseen genetic perturbations remains a fundamental challenge in computational biology, with broad applications in understanding gene function, disease mechanisms, and therapeutic development. Despite advances in computational approaches, developing models that generalise effectively to novel perturbations continues to be difficult. Large Language Models (LLMs) have shown promise in biological applications by synthesizing scientific knowledge, but their direct application to high-dimensional gene expression data has been impractical due to numerical limitations. We propose LangPert, a novel hybrid framework that leverages LLMs to guide a downstream k-nearest neighbors (kNN) aggregator, combining biological reasoning with efficient numerical inference. We demonstrate that LangPert achieves state-of-the-art performance on single-gene perturbation prediction tasks across multiple datasets.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 21062
Loading