A Simple and Comprehensive Benchmark for Single-Cell Transcriptomics

Published: 01 Jan 2025, Last Modified: 09 Aug 2025AAAI 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Single-cell transcriptomics describes complex molecular features at the individual cell level, serving various roles in biological research, such as enhancing gene expression and predicting drug responses. Due to transcriptomic data structurally resembling sequential data, many researchers have trained numerous transformers on extensive transcriptomic datasets. However, they have consistently neglected to explore the intrinsic properties of the data and the appropriateness of their chosen model architecture. In this paper, we carefully investigate the nature of transcriptomics, identifying three overlooked problems: 1) long-tailed data problem, 2) model selection problem, and 3) evaluation problem. Consequently, by applying the weighted sampling strategy, we address the long-tailed data problem and achieve consistent improvement across all settings. By adapting different model structures to transcriptomic data, we discover that transformers are not the only option. By developing three downstream tasks and fair evaluation metrics, we establish a simple and comprehensive benchmark to validate the effectiveness of models for transcriptomics. Through extensive experiments, we clarify the misunderstandings in the traditional methods and provide competitive baselines, thereby paving the way for future research in this field.
Loading