Keywords: Large language models, single-cell biology, multimodal learning, language grounding, biological ontologies, agentic AI, foundation models, bioinformatics, cross-modal reasoning, interpretability
Abstract: Large language models (LLMs) and emerging agentic frameworks are beginning to influence single-cell biology by enabling natural-language interfaces, generative annotation, and multimodal data integration.
However, progress remains fragmented across data modalities, model families, and evaluation practices.
LLM4Cell presents a unified survey of 58 foundation and agentic models developed for single-cell research, spanning RNA, ATAC, multi-omic, and spatial modalities.
We organize these methods into five families foundation, text-bridge, spatial/multimodal, epigenomic, and agentic and map them to eight key analytical tasks, including annotation, trajectory inference, perturbation modeling, and drug-response prediction.
Drawing on over 40 public datasets, we analyze benchmark coverage, data diversity, and ethical or scalability constraints, and synthesize reported capabilities across ten domain-level dimensions related to biological grounding, multimodal alignment, fairness, privacy, and interpretability.
By explicitly linking datasets, modeling paradigms, and evaluation domains, LLM4Cell provides an integrated perspective on language-driven single-cell analysis and highlights open challenges in standardization, interpretability, and trustworthy model development.
Paper Type: Long
Research Area: Clinical and Biomedical Applications
Research Area Keywords: Large Language Models, Single-Cell Biology, Multimodal Learning, Agentic Systems, Biomedical NLP, Survey and Taxonomy, Trustworthy AI
Contribution Types: Surveys
Languages Studied: English
Submission Number: 2928
Loading