LangPert: LLM-Driven Contextual Synthesis for Unseen Perturbation Prediction

Kaspar Märtens; Marc Boubnovski Martell; Cesar A. Prada-Medina; Rory Donovan-Maiye

LangPert: LLM-Driven Contextual Synthesis for Unseen Perturbation Prediction

Kaspar Märtens, Marc Boubnovski Martell, Cesar A. Prada-Medina, Rory Donovan-Maiye

Published: 05 Mar 2025, Last Modified: 25 Apr 2025MLGenX 2025 SpotlightEveryoneRevisionsBibTeXCC BY 4.0

Track: Main track (up to 8 pages)

Abstract: Systematic genetic perturbation provides critical insights into cell functioning, yet predicting their cellular effects remains a major challenge. Despite advances in computational approaches, accurately modelling cellular responses to unseen perturbations continues to be difficult. Large Language Models (LLMs) have shown promise in biological applications by synthesizing scientific knowledge, but their direct application to high-dimensional gene expression data has been impractical due to numerical limitations. We propose LangPert, a novel hybrid framework that leverages LLMs to guide a downstream k-nearest neighbors (kNN) aggregator, combining biological reasoning with efficient numerical inference. We demonstrate that LangPert achieves state-of-the-art performance on single-gene perturbation prediction tasks across multiple datasets.

Submission Number: 76

Loading