Abstract: In recent years, genomic experiments have generated vast amounts of expression data, including bulk RNA-seq and single-cell RNA-seq datasets, spanning both observational and perturbational studies. However, existing models have not fully leveraged this diverse data landscape, focusing on modeling either bulk or single- cell expression data. We present Funomics T0, the first foundation model that can simultaneously learn from bulk RNA-seq and single-cell RNA-seq datasets from observational and perturbational studies. The proposed Perceiver-based model produces latent representations of the expression data that can be further used for various downstream tasks. We evaluate our model on perturbation prediction and tissue annotation tasks, using a comprehensive benchmark suite and demonstrating strong performance across metrics, with Funomics T0 outperforming the State model on multiple perturbation metrics.
Loading