PETRI: Learning Unified Cell Embeddings from Unpaired Modalities via Early-Fusion Joint Reconstruction
Keywords: high-content screening, cell biology, single cell, transcriptomics, microscopy, multimodal
Abstract: Integrating imaging and transcriptomics screening data holds promise for isolating true biological signals from modality-specific technical artifacts. However, existing multimodal embedding approaches either require pairing or fail to capture both shared and modality-specific information in an end-to-end manner. We present PETRI, an early-fusion transformer that learns a unified cell embedding from unpaired cellular images and gene expression profiles. PETRI groups cells by shared experimental context into multimodal “documents” and performs masked joint reconstruction with cross-modal attention, permitting information sharing while preserving modality-specific capacity. The resulting latent space supports construction of perturbation-level profiles by simple averaging across modalities. Applying sparse autoencoders to the embeddings reveals learned concepts that are biologically meaningful, multimodal, and retain perturbation-specific effects. To support further machine learning research, we release a blinded, matched optical pooled screen (OPS) and Perturb-seq dataset in HepG2 cells.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 23209
Loading