Track: Main track
Keywords: Gene Expression, Genetic Perturbation Modelling, Distributional Shift, Histogram
TL;DR: In this work, we train a neural network that learns gene-level histograms in gene expression following genetic perturbations.
Abstract: We introduce a simple, histogram-based approach for predicting distributional responses in gene expression following genetic perturbations. This is an essential task in early-stage drug discovery, where such responses can offer insights into gene function and inform target identification. Existing methods only optimize for changes in the mean expression, overlooking stochasticity inherent in single-cell data. We instead model per-gene expression distributions, predicting histograms conditioned on perturbations. This captures higher-order statistics (variance, skewness, kurtosis), where our method outperforms baselines at a fraction of the training cost. To generalize to unseen perturbations, we incorporate prior knowledge via gene embeddings from large language models (LLMs). While modeling a richer output space, the method remains competitive in predicting mean expression changes. This work demonstrates that explicitly modeling distributional responses yields richer biological insights while remaining practical and efficient.
AI Policy Confirmation: I confirm that this submission clearly discloses the role of AI systems and human contributors and complies with the ICLR 2026 Policies on Large Language Model Usage and the ICLR Code of Ethics.
Submission Number: 50
Loading