scPilot: Large Language Model Reasoning Toward Automated Single-Cell Analysis and Discovery

Yiming Gao; Zhen Wang; Jefferson Chen; Mark Antkowiak; Mengzhou Hu; JungHo Kong; Dexter Pratt; Jieyuan Liu; Enze Ma; Zhiting Hu; Eric P. Xing

scPilot: Large Language Model Reasoning Toward Automated Single-Cell Analysis and Discovery

Yiming Gao, Zhen Wang, Jefferson Chen, Mark Antkowiak, Mengzhou Hu, JungHo Kong, Dexter Pratt, Jieyuan Liu, Enze Ma, Zhiting Hu, Eric P. Xing

Published: 18 Sept 2025, Last Modified: 31 Jan 2026NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Scientific Reasoning, Automation of Science, Single-cell Transcriptomics, Cell Type Annotation, Trajectory Inference, Gene Regulatory Network Inference

TL;DR: We present scPilot, a framework for omics-native reasoning in which a large language model converses, invokes bioinformatics tools, and iteratively explains its decisions to for single-cell analysis.

Abstract: We present scPilot, the first systematic framework to practice \textit{omics-native reasoning}: a large language model (LLM) converses in natural language while directly inspecting single-cell RNA-seq data and on-demand bioinformatics tools. scPilot converts core single-cell analyses, i.e., cell-type annotation, developmental-trajectory reconstruction, and transcription-factor targeting, into step-by-step reasoning problems that the model must solve, justify, and, when needed, revise with new evidence. To measure progress, we release \scbench, a suite of 9 expertly curated datasets and graders that faithfully evaluate the omics-native reasoning capability of scPilot w.r.t various LLMs. Experiments with o1 show that \textit{iterative} omics-native reasoning lifts average accuracy by 11\% for cell-type annotation and Gemini 2.5 Pro cuts trajectory graph-edit distance by 30\% versus one-shot prompting, while generating transparent reasoning traces that explain marker gene ambiguity and regulatory logic. By grounding LLMs in raw omics data, scPilot enables auditable, interpretable, and diagnostically informative single-cell analyses.

Primary Area: Machine learning for sciences (e.g. climate, health, life sciences, physics, social sciences)

Flagged For Ethics Review: true

Submission Number: 23137

Loading