MutEmbed: Self-Supervised Learning of Biological Latent Embeddings from Cancer Mutational Profiles

Published: 05 Mar 2025, Last Modified: 24 Apr 2025MLGenX 2025 TinyPapersEveryoneRevisionsBibTeXCC BY 4.0
Track: Tiny paper track (up to 4 pages)
Abstract: Cancer genomes possess diverse mutational patterns across multiple profiles, including single base substitutions (SBS), small insertions and deletions (ID), copy number variations (CN), and structural variants (SV). These profiles provide distinct, yet complementary perspectives to understanding a tumor's genomic landscape, which is essential for optimal patient care. Learning unified representations across this complex mutational landscape can reveal deeper insights into cancer biology, therapeutic interventions, and patient stratification. We present MutEmbed, a self-supervised framework that uses attention mechanisms to weigh and integrate information across mutational profiles, capturing their latent biological interdependencies. We use SBS, ID, CN, and SV calls for samples from the Pan-cancer Analysis of Whole Genomes (PCAWG) dataset (n = 2748). Using MutEmbed, we derive embeddings for each sample and demonstrate their biological relevance by analyzing cancer-type specific clustering patterns and enrichment patterns with DNA damage and repair pathway activities.
Submission Number: 27
Loading