scREBOUND: An Efficient Design of single-cell Foundation Model with Batch Representation

ICLR 2026 Conference Submission13272 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: single-cell genomics, foundation model, representation learning
Abstract: Recent advances in single-cell foundation models (scFMs) have demonstrated the promise of large-scale pretraining on single-cell RNA sequencing (scRNA-seq) data for a wide range of downstream biological tasks. However, existing models such as scGPT, UCE, scFoundation, and scMulan demand substantial computational resources for both training and inference, limiting their accessibility and practical deployment in academic settings. Furthermore, the systematic noise within different experimental batches of scRNA-seq datasets, also termed as batch effect, cannot be well removed with the masked token prediction tasks that are commonly used by these models. This significantly jeopardizes the zero-shot performance of these models on new data experiments. In this work, we present a novel and efficient design for single-cell foundation models that significantly reduces computational costs while improving the robustness of cell representation learning. Our architecture introduces a biologically-informed compression strategy to reduce input token numbers of each cell without sacrificing key transcriptomic signals. We also proposed a novel biologically-informed batch encoding strategy and introduced a multi-granular supervised contrastive loss to account for the batch effect during the model pre-training phase. We validate our design through extensive experiments across diverse datasets, demonstrating competitive performance in key zero-shot tasks including cell type annotation, batch effect removal, cross-species knowledge transfer, and missing value imputation, while achieving up to 17x reduction in inference time and 30x reduction of memory usage compared to the SOTA model scGPT. Our design makes foundational single-cell modeling more accessible and robust.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 13272
Loading