Abstract: Gene expression could be perceived as a form of “cell language”, with underlying regulatory mechanisms akin to biological grammar. Decoding this language is critical in understanding cellular functions and behaviors. In this study, we proposed a new pre-training paradigm by integrating rich metadata and pre-training tasks, and developed scMulan, a multitask generative pre-trained language model for single-cell analyses. scMulan can accomplish multiple tasks in zero-shot manner such as cell-type annotation, batch integration, and conditional cell generation, guided by different task prompts. scMulan is also ready to be expanded for novel tasks through fine-tuning.
Loading