ZerOmics: Toward General Models for Single-Cell Analysis with Instruction Tuning

Kaichen Xu; Yueyang Ding

ZerOmics: Toward General Models for Single-Cell Analysis with Instruction Tuning

Kaichen Xu, Yueyang Ding

22 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models; Instruction Tuning; Zero-shot Learning; Bioinformatics

TL;DR: We propose ZerOmics, the first zero-shot method that guides Large Language Models to perform various single-cell multi-omics analysis tasks without relying on specific downstream data.

Abstract: A variety of analysis tasks in single-cell (SC) multi-omics are crucial for precision medicine and clinical research. To address these tasks, existing methods are typically pre-trained on large-scale datasets to obtain general representations, followed by fine-tuning on specific tasks and labeled datasets. However, their task-specific heads often lack generalizability, significantly limiting performance in zero-shot scenarios. Inspired by the success of large language models (LLMs), we propose ZerOmics, the first zero-shot method that guides LLMs to perform various SC tasks without relying on specific downstream data. To enable LLMs to establish a correct and comprehensive understanding of SC data, ZerOmics employs a dual-alignment strategy. Specifically, ZerOmics aligns SC expression data with the well-organized gene corpus, thereby generating robust SC embeddings. These embeddings are then incorporated into instructions designed for various SC analysis tasks to tune the LLM, achieving alignment between SC data and the LLM. Extensive experiments across various sequencing technologies and tissues demonstrate that ZerOmics provides a comprehensive and general solution for SC analysis, achieving performance comparable to or even surpassing the state-of-the-art (SOTA) supervised and fine-tuned methods.

Primary Area: foundation or frontier models, including LLMs

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 2577

Loading