Rethinking Multi-Omics LLMs from the Perspective of Omics-Encoding

Jinzhe Li; Zihong Chen; Jie Ying; Han Hu; Zhefan Wang; haonan he; Wenwei Han; Jiangbin Zheng; Huanjun Kong; Wanli Ouyang; Stan Z. Li; Yuchen Ren; Tao Chen; Nanqing Dong

Rethinking Multi-Omics LLMs from the Perspective of Omics-Encoding

Jinzhe Li, Zihong Chen, Jie Ying, Han Hu, Zhefan Wang, haonan he, Wenwei Han, Jiangbin Zheng, Huanjun Kong, Wanli Ouyang, Stan Z. Li, Yuchen Ren, Tao Chen, Nanqing Dong

02 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multi-omics Large Language Models

TL;DR: We benchmark MOLLMs and denote MOE as aligning multi-omics encoders with an LLM, showing superior performance and a reduced gap to specialist models across nine tasks.

Abstract: Understanding living systems requires interpretable models to elucidate how multi-omics data coordinate transcription and translation across spatiotemporal scales. Inspired by large language models (LLMs), biological foundation models pretrained the omics sequences have shown exciting performance. However, these biological models lack interpretability and transparency in explaining the results. Motivated by advances in cross-modal alignment from vision–language models (VLMs), it is naturally to integrate multi-omics data and nature language into one system: multi-omics large language model (MOLLM), a LLM-based model can understand multi-omics data. To understand the trends, challenges, and limitations of MOLLMs, we provide a comprehensive empirical study on MOLLMs. We systematically review recent progress on MOLLMs based on their omics-encoding design and benchmark the performance gap between MOLLMs with omics-specific models. The extensive experiments show that the proposed multi-omics-encoding design outperforms existing MOLLMs by a large margin and shows promise for narrowing the performance gap against specialist biological models. Code is available at \href{https://anonymous.4open.science/r/BioMLLM_V2-B5E2}{https://anonymous.4open.science/r/mollm}.

Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)

Submission Number: 934

Loading