Multimodal Language Modeling for High-Accuracy Single Cell Transcriptomics Analysis and Generation

Multimodal Language Modeling for High-Accuracy Single Cell Transcriptomics Analysis and Generation

ACL ARR 2025 February Submission4072 Authors

15 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Pre-trained language models (PLMs) have revolutionized scientific research, yet their application to single-cell analysis remains limited. Text PLMs cannot process single-cell RNA sequencing data, while cell PLMs lack the ability to handle free text, restricting their use in multimodal tasks. Existing efforts to bridge these modalities often suffer from information loss or inadequate single-modal pre-training, leading to suboptimal performances. To address these challenges, we propose **S**ingle-**C**ell **M**ulti**M**odal **G**enerative **P**re-trained **T**ransformer (**scMMGPT**), a unified PLM for joint cell and text modeling. scMMGPT effectively integrates the state-of-the-art cell and text PLMs, facilitating cross-modal knowledge sharing for improved performance. To bridge the text-cell modality gap, scMMGPT leverages dedicated cross-modal projectors, and undergoes extensive pre-training on 27 million cells -- the largest dataset for multimodal cell-text PLMs to date. This large-scale pre-training enables scMMGPT to excel in joint cell-text tasks, achieving an 84\% relative improvement of textual discrepancy for cell description generation, 20.5\% higher accuracy for cell type annotation, and 4\% improvement in $k$-NN accuracy for text-conditioned pseudo-cell generation, outperforming baselines. Our code is available at [here](https://anonymous.4open.science/r/scMMGPT-6DDB/).

Paper Type: Long

Research Area: NLP Applications

Research Area Keywords: cross-modal pretraining, multimodal applications

Languages Studied: English, Single Cell Transcriptomics

Submission Number: 4072

Loading