CellxPert: Inference-Time MCMC Steering of a Multi-Omics Single-Cell Foundation Model for In-Silico Perturbation

Andac Demir; Erik W. Anderson; Jeremy L. Jenkins; Srayanta Mukherjee

CellxPert: Inference-Time MCMC Steering of a Multi-Omics Single-Cell Foundation Model for In-Silico Perturbation

Andac Demir, Erik W. Anderson, Jeremy L. Jenkins, Srayanta Mukherjee

Published: 02 Mar 2026, Last Modified: 17 Apr 2026MLGenX 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: In this work, we introduce CellxPert, a scalable multimodal foundation model that unifies single-cell and spatial multi-omics within a common representation space. CellxPert jointly encodes transcriptomic (scRNA-seq), chromatin-accessibility (ATAC-seq), and surface-proteomic (CITE-seq) measurements, while directly incorporating MERFISH and imaging mass-cytometry data as 2D or 3D spatial–visual layers. CellxPert facilitates four key downstream tasks out of the box: (i) cell‑type annotation across a broad ontology of 154 largely overlapping identities—the largest label space addressed to date and a stringent test of fine‑grained discrimination, (ii) efficient fine-tuning using Low Rank Adaptation (LoRA), (iii) genome-wide transcriptomic response prediction to in-silico perturbations (ISP), and (iv) seamless multi-omic integration across various assays and platforms. Unlike current single-cell foundation models, which approximate gene perturbations by deleting or reordering tokenized gene expression ranks, CellxPert employs a Metropolis–Hastings sampler whose proposal kernel uses the model’s masked conditional distributions to transition to new transcriptomic states conditioned on the perturbed genes. This Markov‑chain procedure mitigates out‑of‑distribution artifacts introduced by abrupt token manipulation and produces trajectories that are biologically interpretable. Evaluations on PBMC68K, Replogle Perturb-seq, Systema and BMMC benchmarks show that CellxPert surpasses classical and state-of-the-art baselines in cell type annotation, perturbation response prediction, and multi-omic integration.

Track: Main track

AI Policy Confirmation: I confirm that this submission clearly discloses the role of AI systems and human contributors and complies with the ICLR 2026 Policies on Large Language Model Usage and the ICLR Code of Ethics.

Submission Number: 45

Loading