LENS: LLM-based Enrichment of Nested Subclusters

Alisha Saboowala; Ping Wu; Yogesh Pandit; David Richmond; Jan-Christian Huetter; Avtar Singh; Vladimir Ermakov

LENS: LLM-based Enrichment of Nested Subclusters

Alisha Saboowala, Ping Wu, Yogesh Pandit, David Richmond, Jan-Christian Huetter, Avtar Singh, Vladimir Ermakov

Published: 02 Mar 2026, Last Modified: 08 May 2026MLGenX 2026 TinypapertrackEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Recent studies show that large language models (LLMs) can accurately annotate curated gene sets. However, real-world biological data are often noisy, complicating clustering and requiring manual curation. We evaluate the ability of LLMs to refine noisy, geometry-derived gene clusters into biologically meaningful components. We introduce LENS (LLM-based Enrichment of Nested Subclusters), a hybrid framework that combines geometric clustering with LLM-based reasoning to decompose gene clusters into interpretable, nested, and overlapping biological submodules. By leveraging biological information encoded in LLM training corpora, LENS provides a scalable complement to traditional enrichment approaches and improves biological specificity in complex, high-dimensional datasets.

Submission Number: 61

Loading