A single-cell gene expression language model

William Connell; Umair Khan; Michael Keiser

A single-cell gene expression language model

William Connell, Umair Khan, Michael Keiser

09 Oct 2022 (modified: 20 Jul 2025)LMRL 2022 PaperReaders: Everyone

Keywords: scRNA-seq, gene regulation, transcriptomics, self-supervised, pretraining, transfer learning

TL;DR: We model regulatory complexity by learning gene dependencies across scRNA-seq expression contexts with a self-supervised task.

Abstract: Gene regulation is a dynamic process that connects genotype and phenotype. Given the difficulty of physically mapping mammalian gene circuitry, we require new computational methods to learn regulatory rules. Natural language is a valuable analogy to the communication of regulatory control. Machine learning systems model natural language by explicitly learning context dependencies between words. We propose a similar system applied to single-cell RNA expression profiles to learn context dependencies between genes. Our model, Exceiver, is trained across a diversity of cell types using a self-supervised task formulated for discrete count data, accounting for feature sparsity. We found agreement between the similarity profiles of latent sample representations and learned gene embeddings with respect to biological annotations. We evaluated Exceiver on a new dataset and a downstream prediction task and found that pretraining supports transfer learning. Our work provides a framework to model gene regulation on a single-cell level and transfer knowledge to downstream tasks.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/a-single-cell-gene-expression-language-model/code)

0 Replies

Loading