Retrieval-Augmented Generation for Predicting Cellular Responses to Gene Perturbation

Published: 02 Mar 2026, Last Modified: 02 Mar 2026Gen² 2026 PosterEveryoneRevisionsCC BY 4.0
Track: Full / long paper (5-8 pages)
Keywords: cellular responses, perturbation modelling, retrieval-augmented generation
TL;DR: We propose PT-RAG, the first differentiable RAG framework for predicting single-cell perturbation responses. Unlike standard RAG, PT-RAG learns to retrieve a cell-aware context, improving generalization to unseen cells.
Abstract: Predicting how cells respond to genetic perturbations is fundamental to understanding gene function, disease mechanisms, and therapeutic development. While recent deep learning approaches have shown promise in modeling single-cell perturbation responses, they struggle to generalize across cell types and perturbation contexts due to limited contextual information during generation. We introduce PT-RAG (Perturbation-aware Two-stage Retrieval-Augmented Generation), a novel framework that extends Retrieval-Augmented Generation beyond traditional language model applications to the domain of cellular biology. Unlike standard RAG systems designed for text retrieval with pre-trained LLMs, perturbation retrieval lacks established similarity metrics and requires learning what constitutes relevant context, making differentiable retrieval essential. PT-RAG addresses this through a two-stage pipeline: first retrieving candidate perturbations K using GenePT embeddings, then adaptively refining the selection through Gumbel-Softmax discrete sampling conditioned on both the cell state and the input perturbation. This cell-type-aware differentiable retrieval enables end-to-end optimization of the retrieval objective jointly with generation. On the Replogle single-gene perturbation dataset, we demonstrate that PT-RAG consistently outperforms both STATE and vanilla RAG under identical experimental conditions, underscoring that naive retrieval does not directly address the need for perturbation context. Our results establish retrieval-augmented generation as a promising paradigm for modelling cellular responses to gene perturbation. The code to reproduce our experiments is available at https://anonymous.4open.science/r/PT-RAG_ICLR-67E8.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 45
Loading