CeLLM: Can Large Language Models Achieve the AI  Virtual Cell ?

Jingbo Zhou; Zhongqi Wang; Mutian Hong; Zhuoli Ouyang; Yixuan Du; Siyuan Li; Siqi Ma; Bolin Yang; Changxi Chi; Yunfan Liu; Yufei Huang; Changkai Li; Shu Wang; Zicheng Liu; jiawei jiang; Zelin Zang; Jun Xia; Cheng Tan; Zhen Lei; Stan Z. Li; Chang Yu

CeLLM: Can Large Language Models Achieve the AI Virtual Cell ?

15 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, AI Virtual Cell

Abstract: High-throughput single-cell sequencing has enabled large-scale cellular profiling and spurred the development of single-cell foundation models. These models, typically pretrained on transcriptomic data, learn general-purpose cellular representations but remain limited in modality coverage, causal reasoning, and interpretability, thus falling short of the vision of an Artificial Intelligence Virtual Cell (AIVC). In parallel, large language models (LLMs) have demonstrated strong potential for unifying heterogeneous modalities, adapting to diverse tasks, and generating interpretable reasoning chains in natural language, making them promising candidates toward AIVC. Recent progress in applying LLMs to tasks such as cell annotation and perturbation prediction highlights this potential, yet key challenges persist, including insufficient task coverage, narrow evaluation metrics, and limited robustness to input and prompting factors. To address these gaps, we introduce \textbf{CeLLM}, a comprehensive benchmarking framework for evaluating \textbf{LLM}s in the \textbf{CeLL}ular domain. CeLLM covers a broad spectrum of tasks spanning gene, cell, and omics-level analyses, systematically assesses 15 open-source, proprietary, and biology-specialized models, and incorporates diverse evaluation criteria under multiple task settings. As a cross-scale, reproducible, and dynamic benchmark, CeLLM provides a sustainable platform to track progress, foster methodological innovation, and accelerate the development of LLMs toward virtual cell modeling.

Primary Area: datasets and benchmarks

Submission Number: 6289

Loading

CeLLM: Can Large Language Models Achieve the AI Virtual Cell ?

Jingbo Zhou, Zhongqi Wang, Mutian Hong, Zhuoli Ouyang, Yixuan Du, Siyuan Li, Siqi Ma, Bolin Yang, Changxi Chi, Yunfan Liu, Yufei Huang, Changkai Li, Shu Wang, Zicheng Liu, jiawei jiang, Zelin Zang, Jun Xia, Cheng Tan, Zhen Lei, Stan Z. Li et al. (1 additional authors not shown)