Bimodal masked language modeling for bulk RNA-seq and DNA methylation representation learning

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multimodal Learning, Representation learning, bulk RNA-seq, DNA methylation, Cancer prognosis
TL;DR: We introduce MOJO, a model that learns joint representations of bulk RNA-seq and DNA methylation and achieves state-of-the-art performance in cancer-type classification and survival analysis.
Abstract: Oncologists are increasingly relying on multiple modalities to model the complexity of diseases. Within this landscape, transcriptomic and epigenetic data have proven to be particularly instrumental and play an increasingly vital role in clinical applications. However, their integration into multimodal models remains a challenge, especially considering their high dimensionality. In this work, we present a novel bimodal model that jointly learns representations of bulk RNA-seq and DNA methylation leveraging self-supervision from masked language modeling. We leverage an architecture that reduces the memory footprint usually attributed to purely transformer-based models when dealing with long sequences. We demonstrate that the obtained bimodal embeddings can be used to fine-tune cancer-type classification and survival models that achieve state-of-the-art performance compared to unimodal models. Furthermore, we introduce a robust learning framework that maintains downstream task performance despite missing modalities, enhancing the model’s applicability in real-world clinical settings.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 12204
Loading