Can Lightweight LLM Agents Improve Spatial Transcriptomics Annotation?

Can Lightweight LLM Agents Improve Spatial Transcriptomics Annotation?

ACL ARR 2026 January Submission9180 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Spatial transcriptomics, large language models, agentic reasoning, low-compute AI, spatial clustering, biological interpretability, post-hoc refinement, spatial coherence

Abstract: Spatial transcriptomics (ST) enables the study of tissue organization by linking gene expression to spatial context, yet automated annotation of spatial regions remains challenging. While recent work has explored large language models (LLMs) for biological reasoning, their utility in low-compute, locally deployable settings is poorly understood. We study whether lightweight, open-weight LLMs can improve ST region annotation when used as constrained post-hoc reviewers rather than standalone predictors. Our approach combines deterministic rule-based heuristics, prototype-derived neighborhood summaries, and a tri-role LLM review process (Analyst–Consensus–Reviewer) that is selectively invoked for ambiguous regions. We evaluate single- and multi-stage variants across six STARmap and MERFISH datasets using standard clustering and spatial coherence metrics (NMI, ARI, CHAOS, ASW). Results show that small models such as \texttt{LLama 3.2} and \texttt{Qwen3} match deterministic baselines in clustering accuracy on average across datasets, while consistently improving spatial coherence and interpretability. These findings suggest that lightweight LLM components can serve as resource-efficient, coherence-aware modules in spatial omics annotation pipelines.

Paper Type: Short

Research Area: Clinical and Biomedical Applications

Research Area Keywords: LLM agents, multimodal reasoning, spatial transcriptomics, biological data annotation, low-resource models, interpretability, language grounding, agentic coordination, open-source LLMs, scientific NLP applications

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches to low-resource settings, Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models, Data analysis

Languages Studied: English

Submission Number: 9180

Loading