VirtualCeLLM: A Comprehensive Benchmark and Guidance for Large Language Models in Cellular Biology

Published: 28 May 2026, Last Modified: 28 May 2026ICML 2026 FM4LS Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: AIVC, LLMs
TL;DR: A Comprehensive Benchmark and Guidance for Large Language Models in Cellular Biology.
Abstract: High-throughput single-cell sequencing has enabled large-scale cellular profiling and accelerated the development of single-cell foundation models. While these models learn general-purpose representations from transcriptomic data, they remain limited in modality coverage, causal reasoning, and interpretability, falling short of the vision of an Artificial Intelligence Virtual Cell (AIVC). In contrast, large language models (LLMs) offer a complementary paradigm: unify heterogeneous inputs, adapt to diverse tasks via prompting, and produce human-readable rationales, making them promising building blocks for AIVC. Recent efforts have applied LLMs to problems such as cell type annotation and perturbation-related reasoning, yet a systematic evaluation is still missing. To bridge this gap, we introduce VirtualCeLLM, a comprehensive benchmark for evaluating LLMs in cellular biology. VirtualCeLLM spans tasks across gene, cell, and omics-level analyses, evaluates 15 models including open-source, closed-source and biology-specialized LLMs, and supports diverse metrics under multiple settings. As a cross-scale, reproducible, and extensible benchmark, VirtualCeLLM enables consistent tracking of progress, facilitates principled comparisons, and provides actionable insights to accelerate LLM-powered virtual cell modeling.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 39
Loading