LLM4Cell: A Survey of Large Language and Agentic Models for Single-Cell Biology

ACL ARR 2026 January Submission2928 Authors

03 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large language models, single-cell biology, multimodal learning, language grounding, biological ontologies, agentic AI, foundation models, bioinformatics, cross-modal reasoning, interpretability
Abstract: Large language models (LLMs) and emerging agentic frameworks are beginning to influence single-cell biology by enabling natural-language interfaces, generative annotation, and multimodal data integration. However, progress remains fragmented across data modalities, model families, and evaluation practices. LLM4Cell presents a unified survey of 58 foundation and agentic models developed for single-cell research, spanning RNA, ATAC, multi-omic, and spatial modalities. We organize these methods into five families foundation, text-bridge, spatial/multimodal, epigenomic, and agentic and map them to eight key analytical tasks, including annotation, trajectory inference, perturbation modeling, and drug-response prediction. Drawing on over 40 public datasets, we analyze benchmark coverage, data diversity, and ethical or scalability constraints, and synthesize reported capabilities across ten domain-level dimensions related to biological grounding, multimodal alignment, fairness, privacy, and interpretability. By explicitly linking datasets, modeling paradigms, and evaluation domains, LLM4Cell provides an integrated perspective on language-driven single-cell analysis and highlights open challenges in standardization, interpretability, and trustworthy model development.
Paper Type: Long
Research Area: Clinical and Biomedical Applications
Research Area Keywords: Large Language Models, Single-Cell Biology, Multimodal Learning, Agentic Systems, Biomedical NLP, Survey and Taxonomy, Trustworthy AI
Contribution Types: Surveys
Languages Studied: English
Submission Number: 2928
Loading