Bridging Gene Expression and Text: LLMs Can Complement Single-Cell Foundation Models

ICLR 2026 Conference Submission22428 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Models, Computational Biology, Multimodality, Interpretability, Single-Cell Analysis, Genomics
Abstract: Single-cell foundation models such as scGPT represent a significant advancement in single-cell omics, with an ability to achieve state-of-the-art performance on various downstream biological tasks. However, these models are inherently limited in that a vast amount of information in biology exists as text, which they are unable to leverage. There have therefore been several recent works that propose the use of LLMs as an alternative to single-cell foundation models, achieving competitive results. However, there is little understanding of what factors drive this performance, along with a strong focus on using LLMs as an alternative, rather than complementary approach to single-cell foundation models. In this study, we therefore investigate what biological insights contribute toward the performance of LLMs when applied to single-cell data, and how these models can complement single-cell foundation models to improve upon their performance. We first conduct a series of interpretability and ablation tests which show that LLMs leverage marker gene knowledge and simple gene expression patterns, contributing to their competitive performance. We then introduce scMPT, a proof-of-concept model which combines single-cell representations from LLMs and single-cell foundation models, demonstrating synergies between these representations through stronger, more consistent performance across datasets and tasks. We also experiment with alternate fusion methods, which highlight the potential of combining specialized reasoning models with scGPT to improve performance. This study ultimately showcases the potential for LLMs to complement single-cell foundation models and drive improvements in single-cell analysis.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 22428
Loading