Abstract: Recent studies show that large language models (LLMs) can accurately annotate curated gene sets. However, real-world biological data are often noisy, complicating clustering and requiring manual curation. We evaluate the ability of LLMs to refine noisy, geometry-derived gene clusters into biologically meaningful components. We introduce LENS (LLM-based Enrichment of Nested Subclusters), a hybrid framework that combines geometric clustering with LLM-based reasoning to decompose gene clusters into interpretable, nested, and overlapping biological submodules. By leveraging biological information encoded in LLM training corpora, LENS provides a scalable complement to traditional enrichment approaches and improves biological specificity in complex, high-dimensional datasets.
Submission Number: 61
Loading